hoqolo / SDSTrack

[CVPR 2024] SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
MIT License
31 stars 4 forks source link

Regarding data loading #1

Open FogSue opened 6 months ago

FogSue commented 6 months ago

"Thank you for your open-source code. I noticed that, compared to ViPT, you have modified the _get_frame_path method in lasher.py. Does this mean you didn't rename lasher but instead used the original trainingset?"

hoqolo commented 6 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

FogSue commented 6 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

I've been waiting for you for a long time. I wanted to ask if you've finally achieved the accuracy reported in the VIPT paper. I've spent a lot of time troubleshooting, but still haven't found a solution.

hoqolo commented 6 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

I've been waiting for you for a long time. I wanted to ask if you've finally achieved the accuracy reported in the VIPT paper. I've spent a lot of time troubleshooting, but still haven't found a solution.

Sorry to keep you waiting. In fact, my reproduction of ViPT did not achieve the reported accuracy, but we still retain the results in its paper. Our SDSTrack uses the official data set (https://github.com/BUGPLEASEOUT/LasHeR), which may be the original data set you mentioned.

FogSue commented 6 months ago

The naming rules for different sequences in the LasHeR data set we downloaded are inconsistent. For example, a picture in the sequence "2whitegirl" is named i000.jpg, while a picture in the sequence "3bike2" is named 000001.jpg. The original way of reading data would report an error, so we modified the way of reading files.

I've been waiting for you for a long time. I wanted to ask if you've finally achieved the accuracy reported in the VIPT paper. I've spent a lot of time troubleshooting, but still haven't found a solution.

Sorry to keep you waiting. In fact, my reproduction of ViPT did not achieve the reported accuracy, but we still retain the results in its paper. Our SDSTrack uses the official data set (https://github.com/BUGPLEASEOUT/LasHeR), which may be the original data set you mentioned.

Thank you very much for providing the link. This is exactly the original dataset I wanted to mention. Additionally, I have an incomplete vipt-rgbt training log here. vipt-deep_rgbt.log

In fact, the first line of the first epoch differs significantly from the official training log provided. If you still have the previous training logs, could you please check if our situation is the same?

hoqolo commented 6 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

FogSue commented 6 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

FogSue commented 6 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

This morning, I attempted to reproduce your model. I utilized 2*A6000 GPUs with a batch size of 32 per card. The virtual environment was installed using the following command: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html However, from the logs, it appears that I encountered the same issue as when reproducing ViPT. There was a significant gap in accuracy right from the beginning, and based on my experience, such discrepancies tend to persist throughout the training process. I made only minor modifications to your model, changing the dataset path and project path. Could you please advise what might be causing this inconsistency? sdstrack-cvpr2024_rgbt.log

hoqolo commented 6 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

This morning, I attempted to reproduce your model. I utilized 2*A6000 GPUs with a batch size of 32 per card. The virtual environment was installed using the following command: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html However, from the logs, it appears that I encountered the same issue as when reproducing ViPT. There was a significant gap in accuracy right from the beginning, and based on my experience, such discrepancies tend to persist throughout the training process. I made only minor modifications to your model, changing the dataset path and project path. Could you please advise what might be causing this inconsistency? sdstrack-cvpr2024_rgbt.log

If there is no problem with the pre-trained model and data set, I suggest you try to keep it consistent with our environment, including virtual environment, batch_size, etc.

FogSue commented 6 months ago

I think you can check the dataset as well as the pretrained model (we are using vitb_256_mae_ce_32x4_ep300 in OSTrack). I need time to find the original training log. The following is a screenshot of what I just trained, for reference only. image-20240405220607923

Thank you very much for the screenshot you provided, although from the path it seems that you are training your model. In fact, due to ongoing troubleshooting of the accuracy issue with ViPT, I haven't had a chance to reproduce your model yet. The logs I provided are also training logs of ViPT on the Lasher dataset. If you still have the previous training logs related to ViPT, could you please check if our accuracies are similar?

This morning, I attempted to reproduce your model. I utilized 2*A6000 GPUs with a batch size of 32 per card. The virtual environment was installed using the following command: pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html However, from the logs, it appears that I encountered the same issue as when reproducing ViPT. There was a significant gap in accuracy right from the beginning, and based on my experience, such discrepancies tend to persist throughout the training process. I made only minor modifications to your model, changing the dataset path and project path. Could you please advise what might be causing this inconsistency? sdstrack-cvpr2024_rgbt.log

If there is no problem with the pre-trained model and data set, I suggest you try to keep it consistent with our environment, including virtual environment, batch_size, etc.

"Could you please provide your GPU driver version?"