mit-han-lab / fastcomposer

[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
https://fastcomposer.mit.edu
MIT License
644 stars 36 forks source link

CUDA OOM Error with 8xA800(80G) GPUs at Default BS=16 #33

Open annaoooo opened 3 months ago

annaoooo commented 3 months ago

I am reaching out regarding a CUDA out of memory issue I've encountered during multi-GPU training on a setup featuring 8 A800 GPUs, each equipped with 80GB of memory. The problem surfaces even when utilizing the default batch size of 16, necessitating a reduction to a batch size of 8 for the training to proceed. Despite conforming to all other script configurations, the memory limitation persists. I am seeking collective wisdom on the possible causes of this phenomenon and any immediate recommendations for improving memory management strategies. Your insights and experiences are highly appreciated.

Thank you in advance for any guidance or shared knowledge on this matter.

TimandXiyu commented 3 months ago

It runs fine with A100 80G even with BS=32. It'd be strange to see A800 80G cannot host the model w/o OOM. BS=32 should yield a memory usage of around 65GB, BS=16 is about 40GB. Maybe you should make sure you installed xformer and other dependencies correctly, current readme do have some version mismatch and you need to manually settle them to make sure packages all matches correctly.

annaoooo commented 3 months ago

Thank you very much for your prompt and insightful response. We have successfully resolved the issue. It turns out I had inadvertently removed xformers across different versions of our environment, which led to excessive memory usage. With your guidance, we have reinstalled the environment and can now smoothly train with a larger batch size. Thank you once again for your invaluable support!

xilanhua12138 commented 2 months ago

@TimandXiyu Why when I enable xformers, the num_layers becomes zero? Can you share the xfomers version and coda version?