Minimum requirement for training model on GPU

hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

https://hkchengrex.com/Cutie/

MIT License

579 stars 60 forks source link

Minimum requirement for training model on GPU #48

Closed pakcheera closed 3 months ago

pakcheera commented 4 months ago

I want to retraining the model from my datasets. I see you trained with four A100 GPUs. Cloud you tell me what is minimum required for training it (ex. one single A100 GPU).

hkchengrex commented 4 months ago

IIRC two A100s would also work but not one A100. The main training stage uses significantly more memory which would be the bottleneck.

Zarxrax commented 4 months ago

What would be the most effective way to reduce the memory requirement for training? Lower batch size, or perhaps reduce seq_length? Do you think its feasible to train on a single A100 or would the result be significantly worse such that its not worth it?

hkchengrex commented 4 months ago

I cannot attest to which one is the "best". You might also try gradient accumulation.