Why `pl.Trainer` can not handle multi-gpu case?

lllyasviel / ControlNet

Let us control diffusion models!

Apache License 2.0

30.03k stars 2.71k forks source link

Why `pl.Trainer` can not handle multi-gpu case? #319

Open doem97 opened 1 year ago

doem97 commented 1 year ago

I can run the original tutorial_train.py with single 3090Ti GPU (24G) with batch_size 3.

However, when upgrade to 2 or more gpus, it keep warning OOM.

trainer = pl.Trainer(gpus=2 precision=32, callbacks=[logger])

I am curious why? Why single GPU can handle batch 3 while multi-GPU can only handle 1?? The GPUS hold batches on their own parallelly, am I right?

lllyasviel commented 1 year ago

because one gpu need to compute "gradient = (gradient_from_gpu_1 + gradient_from_gpu_2) / 2" This computation will take many vram.

doem97 commented 1 year ago

gradient_from_gpu_1 + gradient_from_gpu_2

Thanks @lllyasviel !

So basically bottleneck is the one holding gradient averaging, while remaining should work fine, e.g., GPU0 require 24G+24G; GPU1 require 24G; GPU2 require 24G; GPU3 require 24G.

THUS, we should ensure GPU0 some space, e.g., work with GPU0 12G+12G, GPU1 12G, GPU2 12G, GPU3 12G.

doem97 commented 1 year ago

Sry I got question again.

I travel from recognition community. In recognition, normally the multi-GPU training won't result in significant different RAMs among GPUs. Does this "1-big-gpu" thing only happen in stable diffusion/control net?

SwayStar123 commented 1 year ago

use fsdp or deepspeed training strategy

geroldmeisinger commented 1 year ago

HuggingFace Diffusers ControlNet training script https://huggingface.co/docs/diffusers/training/controlnet has different optimizations builtin

geroldmeisinger commented 1 year ago

all duplicates concerning "Multi GPU" https://github.com/lllyasviel/ControlNet/issues/148 https://github.com/lllyasviel/ControlNet/issues/314 https://github.com/lllyasviel/ControlNet/issues/319 https://github.com/lllyasviel/ControlNet/issues/507