microsoft / VideoX

VideoX: a collection of video cross-modal models
Other
966 stars 160 forks source link

[X-CLIP]About training time. #82

Closed yusq45 closed 1 year ago

yusq45 commented 1 year ago

Thanks for your great works! The problem is: Although I use ssd, I still need to spend 2 hours training an epoch for ViT-B/32. I saw that you only spent 7 minutes training an epoch. Pointing out that my GPU usage is 0 most of the time.

nbl97 commented 1 year ago

Thanks for your interest. First, the ViT-B/32 was trained with 32 V100 GPUs. Then, pls check the time of loading data which may reduce the GPU utilization. Besides, we pre-cut the short side of videos to 256px for saving and fast reading, but I'm not sure how much of a speed gain this gives. Last, if you used the tar format, pls make sure you are just packing but not compressing the data. Hope this can help you.