Closed JunbongJang closed 9 months ago
Hi, thank you for your interests.
It's crucial not to remove accelerate.prepare() as it's used in conjunction with accelerate.backward() for gradient updates. Without it, your model won't be optimized at all. For more details, you can refer to the Hugging Face documentation at
If you're using a single 3090, you may encounter Out Of Memory errors. This is because DeepSpeed's zero2 feature splits optimization across multiple GPUs to minimize memory usage. Therefore, it's recommended to use multiple GPUs.
For your reference, our setup includes 16 V100-32G GPUs, with a batch size of 2 per GPU.
We've also included some intermediate results at the 10,000-step. There are some artifacts, but the model is still in the optimization phase, and we plan to release the weights at a later date.
Thank you for the quick reply! Your intermediate results are encouraging. I will try using cloud GPU with more memory.
Thank you for your great work! Since your training code takes pose as input, I had an impression that this is closer to another concurrent work, Animate Anyone.
Anyway, I am trying to train using RTX 3090 GPU but I get out of memory error at
model, optimizer = accelerator.prepare(model, optimizer) in train.py
So I took that line out and trained the model for about 70000 steps but I get the same validation grid video for different validation steps. So I wonder if the weights are being updated at all...
Did you also experience the similar problem?