Minimum GPU Memory Requirements for Domain Generalization Experiments

muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

https://muzairkhattak.github.io/multimodal-prompt-learning/

MIT License

578 stars 43 forks source link

Minimum GPU Memory Requirements for Domain Generalization Experiments #64

Closed ayuan0626 closed 2 months ago

ayuan0626 commented 2 months ago

Hello, I read your research paper and would like to start my own research based on it. But while training imagenet in domain generalization experiments on my own research, CUDA out of Memory. After I first tried to change the batch_size from 4 to 2, the memory footprint was still 24.1GB. I saw that you utilized 40GB A100 for experiments. I would like to ask about the minimum memory requirements for the original MaPLe when conducting domain generalization experiments. We would appreciate it if you could share any possible workarounds or suggestions for running experiments on GPUs with lower memory capacity. Thank you in advance for your time and help. I look forward to your reply. Sincerely

muzairkhattak commented 2 months ago

Hi @ayuan0626

Thank you for showing interest in MaPLe!

Regarding your query, I believe that training MaPLe on ImageNet with all 1k classes requires around 20-24GB of memory.

In your case, I think the only workaround to run the code on less compute would be to reduce the batch size.

Additionally, you can check if you are using the same PyTorch version as we have mentioned in the INSTALL.md file. I have noticed that some versions of PyTorch also consume more GPU memory than usual.

I hope this helps.

Thank you and kind regards,

ayuan0626 commented 2 months ago

Thank you for your kind reply, your answer was very helpful to me. After my checking, the version of pytorch I am using is the same as the version of pytorch you mentioned. My model doesn't add any additional learnable vectors compared to MaPLe, maybe it's the result of me calling zero-shot CLIP more times? I'll check my code again. Thank you again for your patient reply!

muzairkhattak commented 2 months ago

My model doesn't add any additional learnable vectors compared to MaPLe, maybe it's the result of me calling zero-shot CLIP more times? I'll check my code again.

If you are saving the zero-shot CLIP model as a variable, then the memory usage will increase.

Calling a model many times (in sequential manner) should not increase the GPU usage, but it can slow down the training process.

ayuan0626 commented 2 months ago

Thank you for your reply, it was very helpful for me. I'll continue to check for possible issues in my code.