lssiair / TUTR

ICCV2023 TUTR: Trajectory Unified Transformer for Pedestrian Trajectory Prediction
39 stars 6 forks source link

About the clf_loss calculation #3

Open caaaadac opened 10 months ago

caaaadac commented 10 months ago

Hello, firstly thank you for sharing such an excellent work. When I was running the code you provided, I found that there is a bit of a problem in the part of the code for the calculation of clf_loss. The code uses the same loss function as in the paper - cross entropy loss. In this case, for the calculation of clf_loss in the screenshot below, should the soft_label be replaced with closest_mode_indices. squeeze(), or do you have another calculation method, hope to get your reply, thanks again for sharing your excellent work. 1703410663040

lssiair commented 10 months ago

Hello, firstly thank you for sharing such an excellent work. When I was running the code you provided, I found that there is a bit of a problem in the part of the code for the calculation of clf_loss. The code uses the same loss function as in the paper - cross entropy loss. In this case, for the calculation of clf_loss in the screenshot below, should the soft_label be replaced with closest_mode_indices. squeeze(), or do you have another calculation method, hope to get your reply, thanks again for sharing your excellent work. 1703410663040

Yeah, you can replace the soft label with the closest mode indices. They are two ways to choose a prior mode. However, we found the soft label is more effective in performance (ADE/FDE).

caaaadac commented 10 months ago

Hello, firstly thank you for sharing such an excellent work. When I was running the code you provided, I found that there is a bit of a problem in the part of the code for the calculation of clf_loss. The code uses the same loss function as in the paper - cross entropy loss. In this case, for the calculation of clf_loss in the screenshot below, should the soft_label be replaced with closest_mode_indices. squeeze(), or do you have another calculation method, hope to get your reply, thanks again for sharing your excellent work. 1703410663040

Yeah, you can replace the soft label with the closest mode indices. They are two ways to choose a prior mode. However, we found the soft label is more effective in performance (ADE/FDE).

Thanks for your reply, but I get an error when I run the code using soft label (error reported below). It seems that I can't use soft label directly for loss calculation, if you can, can you share your method of calculating with soft label? 1703665144778

lssiair commented 9 months ago

Hello, firstly thank you for sharing such an excellent work. When I was running the code you provided, I found that there is a bit of a problem in the part of the code for the calculation of clf_loss. The code uses the same loss function as in the paper - cross entropy loss. In this case, for the calculation of clf_loss in the screenshot below, should the soft_label be replaced with closest_mode_indices. squeeze(), or do you have another calculation method, hope to get your reply, thanks again for sharing your excellent work. 1703410663040

Yeah, you can replace the soft label with the closest mode indices. They are two ways to choose a prior mode. However, we found the soft label is more effective in performance (ADE/FDE).

Thanks for your reply, but I get an error when I run the code using soft label (error reported below). It seems that I can't use soft label directly for loss calculation, if you can, can you share your method of calculating with soft label? 1703665144778

The code can be run normally in our machine. Could you tell me the detailed error log, such as the size of input score and soft label?

caaaadac commented 9 months ago

Hello, firstly thank you for sharing such an excellent work. When I was running the code you provided, I found that there is a bit of a problem in the part of the code for the calculation of clf_loss. The code uses the same loss function as in the paper - cross entropy loss. In this case, for the calculation of clf_loss in the screenshot below, should the soft_label be replaced with closest_mode_indices. squeeze(), or do you have another calculation method, hope to get your reply, thanks again for sharing your excellent work. 1703410663040

Yeah, you can replace the soft label with the closest mode indices. They are two ways to choose a prior mode. However, we found the soft label is more effective in performance (ADE/FDE).

Thanks for your reply, but I get an error when I run the code using soft label (error reported below). It seems that I can't use soft label directly for loss calculation, if you can, can you share your method of calculating with soft label? 1703665144778

The code can be run normally in our machine. Could you tell me the detailed error log, such as the size of input score and soft label?

When I am debugging the code, the size of both scores and soft_label is ([128, 50]), but the size of closest_mode_indices is ([128]). I didn't change anything else while running, could you please help me to see what's wrong, thanks for your help.

lssiair commented 9 months ago

您好,首先感谢您分享如此出色的作品。当我运行您提供的代码时,我发现用于计算clf_loss的代码部分存在一些问题。该代码使用与论文中相同的损失函数 - 交叉熵损失。在这种情况下,为了计算以下屏幕截图中的clf_loss,应将soft_label替换为closest_mode_indices。squeeze(),或者你有其他的计算方法,希望得到你的回复,再次感谢你分享你的优秀工作。1703410663040

是的,您可以将软标签替换为最接近的模式索引。它们是选择先验模式的两种方法。然而,我们发现软标签在性能(ADE/FDE)方面更有效。

感谢您的回复,但是当我使用软标签运行代码时出现错误(下面报告了错误)。看来我不能直接使用软标签进行损失计算,如果可以的话,你能分享一下你用软标签计算的方法吗?1703665144778

代码可以在我们的机器中正常运行。您能告诉我详细的错误日志吗,例如输入分数的大小和软标签?

When I am debugging the code, the size of both scores and soft_label is ([128, 50]), but the size of closest_mode_indices is ([128]). I didn't change anything else while running, could you please help me to see what's wrong, thanks for your help.

It seems likely the size is normal. Could you give me the whole screenshot about the error. In your provided incomplete screenshot, the 'soft_label.long()' may be wrong.

caaaadac commented 9 months ago

I have fixed the problem so far, the reason was that I had the wrong version of torch. Thank you for your excellent work!

Babak-Ebrahimi commented 9 months ago

I have the same issue, This is the error: python train.py --dataset_name hotel --hp_config config/hotel.py --gpu 0 Namespace(checkpoint='./checkpoint/', data_scaling=[1.9, 0.4], dataset_name='hotel', dataset_path='./dataset/', dist_threshold=2, gpu='0', hp_config='config/hotel.py', lr_scaling=False, num_works=8, obs_len=8, pred_len=12, seed=1) motion modes loading ... scores.squeeze() tensor([[ 0.0528, 0.9354, 0.1013, ..., -0.0172, 0.6348, -0.3728], [ 0.4464, 0.9679, 0.1647, ..., 0.4640, 0.7311, 0.8122], [ 0.1851, 0.9576, 0.1853, ..., 0.1492, 0.5714, 0.0888], ..., [-0.2424, 0.2976, -0.0359, ..., 0.7427, 0.3319, 0.5020], [-0.0894, 0.2282, -0.0196, ..., -0.0150, 0.4049, 0.0206], [ 0.6418, 0.2087, 0.0234, ..., 0.3725, 0.4379, 0.3723]], device='cuda:0', grad_fn=) soft_label tensor([[4.1057e-04, 2.4988e-01, 7.6606e-06, ..., 7.5185e-03, 2.1603e-06, 1.1560e-05], [2.5071e-03, 2.9042e-06, 1.4690e-01, ..., 1.0083e-04, 1.3297e-04, 1.4500e-10], [8.2685e-02, 1.6805e-04, 4.2541e-03, ..., 5.5742e-03, 2.8143e-04, 1.0259e-08], ..., [9.6022e-04, 1.1270e-01, 1.6905e-05, ..., 1.8222e-02, 4.8490e-06, 5.3815e-06], [1.3188e-04, 1.4961e-02, 2.6028e-06, ..., 1.9970e-03, 7.9184e-06, 7.8279e-05], [2.2098e-04, 1.4368e-01, 3.8405e-06, ..., 4.5028e-03, 2.2468e-06, 2.5533e-05]], device='cuda:0') scores.squeeze(),shape torch.Size([128, 90]) soft_label.shape torch.Size([128, 90]) Traceback (most recent call last): File "train.py", line 252, in total_loss = train(ep, model, reg_criterion, cls_criterion, optimizer, train_loader, motion_modes) File "train.py", line 138, in train clf_loss = cls_criterion(scores.squeeze(), soft_label) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 962, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/functional.py", line 2468, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/functional.py", line 2264, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward

Babak-Ebrahimi commented 9 months ago

I have the same issue, This is the error: python train.py --dataset_name hotel --hp_config config/hotel.py --gpu 0 Namespace(checkpoint='./checkpoint/', data_scaling=[1.9, 0.4], dataset_name='hotel', dataset_path='./dataset/', dist_threshold=2, gpu='0', hp_config='config/hotel.py', lr_scaling=False, num_works=8, obs_len=8, pred_len=12, seed=1) motion modes loading ... scores.squeeze() tensor([[ 0.0528, 0.9354, 0.1013, ..., -0.0172, 0.6348, -0.3728], [ 0.4464, 0.9679, 0.1647, ..., 0.4640, 0.7311, 0.8122], [ 0.1851, 0.9576, 0.1853, ..., 0.1492, 0.5714, 0.0888], ..., [-0.2424, 0.2976, -0.0359, ..., 0.7427, 0.3319, 0.5020], [-0.0894, 0.2282, -0.0196, ..., -0.0150, 0.4049, 0.0206], [ 0.6418, 0.2087, 0.0234, ..., 0.3725, 0.4379, 0.3723]], device='cuda:0', grad_fn=) soft_label tensor([[4.1057e-04, 2.4988e-01, 7.6606e-06, ..., 7.5185e-03, 2.1603e-06, 1.1560e-05], [2.5071e-03, 2.9042e-06, 1.4690e-01, ..., 1.0083e-04, 1.3297e-04, 1.4500e-10], [8.2685e-02, 1.6805e-04, 4.2541e-03, ..., 5.5742e-03, 2.8143e-04, 1.0259e-08], ..., [9.6022e-04, 1.1270e-01, 1.6905e-05, ..., 1.8222e-02, 4.8490e-06, 5.3815e-06], [1.3188e-04, 1.4961e-02, 2.6028e-06, ..., 1.9970e-03, 7.9184e-06, 7.8279e-05], [2.2098e-04, 1.4368e-01, 3.8405e-06, ..., 4.5028e-03, 2.2468e-06, 2.5533e-05]], device='cuda:0') scores.squeeze(),shape torch.Size([128, 90]) soft_label.shape torch.Size([128, 90]) Traceback (most recent call last): File "train.py", line 252, in total_loss = train(ep, model, reg_criterion, cls_criterion, optimizer, train_loader, motion_modes) File "train.py", line 138, in train clf_loss = cls_criterion(scores.squeeze(), soft_label) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 962, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/functional.py", line 2468, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/ehsanemadmarvasti/anaconda3/envs/star/lib/python3.6/site-packages/torch/nn/functional.py", line 2264, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward

Issue solved by updating pytorch: pip install --upgrade torch torchvision torchaudio