[examples/text_to_image] cuda out of memory, though I followed the instructions of train_text_to_image.py

N1cekiko commented 1 year ago

Describe the bug

when I ran the script: examples/text_to_image/text_to_image.py, using the follwing command:

export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions"

python train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model" \ --mixed_precision="fp16"

I tried to decrease the resolution and remove --center_crop --random_flip, it did not work.

The hardware I used: V100, 32GB pytorch1.11

logs:

Reproduction

train_1p.txt

Logs

No response

System Info

diffusers version: 0.17.0.dev0
Platform: Linux-5.4.0-60-generic-x86_64-with-debian-buster-sid
Python version: 3.7.5
PyTorch version (GPU?): 1.11.0+cu102 (True)
Huggingface_hub version: 0.14.1
Transformers version: 4.29.1
Accelerate version: 0.19.0
xFormers version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: neither 1p or 8p can work

patrickvonplaten commented 1 year ago

cc @sayakpaul here

sayakpaul commented 1 year ago

Could you paste the full error trace? Will need to check where does this stem from.

Because for a 32 GB V100 card, this seems weird.

N1cekiko commented 1 year ago

I noticed that this bug is the same as the one mentioned in issue2094(https://github.com/huggingface/diffusers/issues/3094) , it has been closed though the prolem is not solved, could you please open the issue again?

sayakpaul commented 1 year ago

I noticed that this bug is the same as the one mentioned in issue2094(https://github.com/huggingface/diffusers/issues/3094) , it has been closed though the prolem is not solved, could you please open the issue again?

We closed the issue because of https://github.com/huggingface/diffusers/issues/3094#issuecomment-1519293627.

Could you follow the suggestions shared in that thread and let us know if the issues still persist?

N1cekiko commented 1 year ago

I have tried, but still not working.

sayakpaul commented 1 year ago

Then please post a detailed thread including what have you already tried and their error logs. Without complete and comprehensive details, it's hard for us to suggest properly.

N1cekiko commented 1 year ago

Thanks for your kind reply and patience. Here are the launch scipts and the logs：

sayakpaul commented 1 year ago

Interesting.

Which library versions are you using?

@pcuenca if you have access to a 24 GB card, could you try to reproduce it once? I currently don't have access to one.

N1cekiko commented 1 year ago

I just git clone the latest master branch, and install diffusers locally; Then install the requirements for text_to_image.

sayakpaul commented 1 year ago

And what Torch version are you? Have you tried using memory efficient attention with xformers?

pcuenca commented 1 year ago

My tests:

PyTorch 2 + ema does not run in <= 24 GB. Testing in another card I found it took ~26 GB of RAM.
PyTorch 2 without ema works fine.
PyTorch 1.13.1 with xFormers (and using --enable_xformers_memory_efficient_attention) works fine and takes only ~14 GB of GPU RAM.

When using PyTorch 2 I verified that we are using AttnProcessor2_0 here: https://github.com/huggingface/diffusers/blob/c6ae8837512d0572639b9f57491d4482fdc8948c/src/diffusers/models/attention_processor.py#L161. I'm not sure what's the reason for not fitting in 24 GB any more.

N1cekiko commented 1 year ago

I used pytorch1.11, without Xformers.

sayakpaul commented 1 year ago

Could you try with PyTorch 1.13.1 and xformers?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

medalwill commented 1 year ago

I also encounter this problem, and degrading my Pytorch version doesn't work. Then I found something, it seems that accelerate always works in the base environment. So I abandon my conda virtual environment and degrade torch in (base). It works!!!!!

huggingface / diffusers