hoqolo / SDSTrack

[CVPR 2024] SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
MIT License
36 stars 4 forks source link

Regarding data loading #1

Open FogSue opened 8 months ago

FogSue commented 8 months ago

"Thank you for your open-source code. I noticed that, compared to ViPT, you have modified the _get_frame_path method in lasher.py. Does this mean you didn't rename lasher but instead used the original trainingset?"

hoqolo commented 7 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

FogSue commented 7 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

I've been waiting for you for a long time. I wanted to ask if you've finally achieved the accuracy reported in the VIPT paper. I've spent a lot of time troubleshooting, but still haven't found a solution.

hoqolo commented 7 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

I've been waiting for you for a long time. I wanted to ask if you've finally achieved the accuracy reported in the VIPT paper. I've spent a lot of time troubleshooting, but still haven't found a solution.

Sorry to keep you waiting. In fact, my reproduction of ViPT did not achieve the reported accuracy, but we still retain the results in its paper. Our SDSTrack uses the official data set (https://github.com/BUGPLEASEOUT/LasHeR), which may be the original data set you mentioned.

FogSue commented 7 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

I've been waiting for you for a long time. I wanted to ask if you've finally achieved the accuracy reported in the VIPT paper. I've spent a lot of time troubleshooting, but still haven't found a solution.

Sorry to keep you waiting. In fact, my reproduction of ViPT did not achieve the reported accuracy, but we still retain the results in its paper. Our SDSTrack uses the official data set (https://github.com/BUGPLEASEOUT/LasHeR), which may be the original data set you mentioned.

Thank you very much for providing the link. This is exactly the original dataset I wanted to mention. Additionally, I have an incomplete vipt-rgbt training log here. vipt-deep_rgbt.log

In fact, the first line of the first epoch differs significantly from the official training log provided. If you still have the previous training logs, could you please check if our situation is the same?

hoqolo commented 7 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

FogSue commented 7 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

FogSue commented 7 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

This morning, I attempted to reproduce your model. I utilized 2*A6000 GPUs with a batch size of 32 per card. The virtual environment was installed using the following command: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html However, from the logs, it appears that I encountered the same issue as when reproducing ViPT. There was a significant gap in accuracy right from the beginning, and based on my experience, such discrepancies tend to persist throughout the training process. I made only minor modifications to your model, changing the dataset path and project path. Could you please advise what might be causing this inconsistency? sdstrack-cvpr2024_rgbt.log

hoqolo commented 7 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

This morning, I attempted to reproduce your model. I utilized 2*A6000 GPUs with a batch size of 32 per card. The virtual environment was installed using the following command: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html However, from the logs, it appears that I encountered the same issue as when reproducing ViPT. There was a significant gap in accuracy right from the beginning, and based on my experience, such discrepancies tend to persist throughout the training process. I made only minor modifications to your model, changing the dataset path and project path. Could you please advise what might be causing this inconsistency? sdstrack-cvpr2024_rgbt.log

If there is no problem with the pre-trained model and data set, I suggest you try to keep it consistent with our environment, including virtual environment, batch_size, etc.

FogSue commented 7 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

This morning, I attempted to reproduce your model. I utilized 2*A6000 GPUs with a batch size of 32 per card. The virtual environment was installed using the following command: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html However, from the logs, it appears that I encountered the same issue as when reproducing ViPT. There was a significant gap in accuracy right from the beginning, and based on my experience, such discrepancies tend to persist throughout the training process. I made only minor modifications to your model, changing the dataset path and project path. Could you please advise what might be causing this inconsistency? sdstrack-cvpr2024_rgbt.log

If there is no problem with the pre-trained model and data set, I suggest you try to keep it consistent with our environment, including virtual environment, batch_size, etc.

"Could you please provide your GPU driver version?"

PoPse147 commented 1 month ago

我认为您可以检查数据集以及预训练模型(我们在 OSTrack 中使用 vitb_256_mae_ce_32x4_ep300)。我需要时间来查找原始训练日志。以下是我刚刚训练的内容的截图,仅供参考。图片-20240405220607923

非常感谢您提供的屏幕截图,尽管从路径上看,您似乎正在训练模型。事实上,由于 ViPT 的准确性问题正在进行故障排除,我还没有机会重现您的模型。我提供的日志也是 Lasher 数据集上 ViPT 的训练日志。如果您仍然拥有与 ViPT 相关的先前培训日志,您能否检查一下我们的准确性是否相似?

今天早上,我尝试重现您的模型。我使用了 2*A6000 个 GPU,每张卡的批量大小为 32 个。使用以下命令安装虚拟环境: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html 但是,从日志来看,我似乎遇到了与重现 ViPT 时相同的问题。从一开始就存在着很大的准确性差距,根据我的经验,这种差异往往会在整个训练过程中持续存在。我只对您的模型进行了微小的修改,更改了数据集路径和项目路径。您能告诉我们可能导致这种不一致的原因吗?sdstrack-cvpr2024_rgbt.log

如果预训练的模型和数据集没有问题,我建议大家尽量保持和我们的环境一致,包括虚拟环境、batch_size等。

“您能提供您的 GPU 驱动程序版本吗?”

再现性差的问题是否已经解决?