Closed TrueWheelProgramming closed 1 year ago
It looks like you're using "multi-gpu" in a non-distributed environment:
num_processes: 1
Please make sure to use Distributed Multi GPU https://pytorch.org/tutorials/beginner/ddp_series_multigpu.html
This should also be correctly set when using accelerate config
before starting the training
You're right. Example working with the correct config ๐คฆ
@TrueWheelProgramming how exactly did you resolve this, please?
Describe the bug
When using the
train_text_to_image.py
example script on a single NVIDIA A10G GPU the example script works great. However, when using 4xNVIDIA A10G if I use the same input arguments but use the--multi_gpu
accelerate flag all 4 of the GPUs run out of memory before the first step is complete.In the single GPU case I can train with batch size of 1 and resolution of 512.
In multi_gpu case even a batch size of 1 and resolution of 64 results in a cuda out of memory error.
Is there any reason why multi-gpu will use significantly more memory?
Is there an issue with my (i) accelerate config or (ii) script arguments?
Thanks in advance.
Reproduction
Accelerate Config
Command
Logs
System Info
diffusers
version: 0.17.0.dev0