Closed yusq45 closed 1 year ago
Thanks for your interest. First, the ViT-B/32 was trained with 32 V100 GPUs. Then, pls check the time of loading data which may reduce the GPU utilization. Besides, we pre-cut the short side of videos to 256px for saving and fast reading, but I'm not sure how much of a speed gain this gives. Last, if you used the tar format, pls make sure you are just packing but not compressing the data. Hope this can help you.
Thanks for your great works! The problem is: Although I use ssd, I still need to spend 2 hours training an epoch for ViT-B/32. I saw that you only spent 7 minutes training an epoch. Pointing out that my GPU usage is 0 most of the time.