[instruct_pix2pix]: When I try to use accelerate launch train_instruct_pix2pix.py with one gpus, it report the error as below:

elricwan commented 1 year ago

Describe the bug

When I try to use accelerate launch train_instruct_pix2pix.py with one gpus, it report the error as below:

File "/home/xiangpeng.wan/miniconda3/envs/transformers/lib/python3.8/site-packages/accelerate/utils/dataclasses.py", line 836, in set_auto_wrap_policy raise Exception("Could not find the transformer layer class to wrap in the model.")

File "train_instruct_pix2pix.py", line 706, in main unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare Exception: Could not find the transformer layer class to wrap in the model.

I did the accelerate config default

Reproduction

export MODEL_NAME="runwayml/stable-diffusion-v1-5" export DATASET_ID="fusing/instructpix2pix-1000-samples"

accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py --pretrained_model_name_or_path=$MODEL_NAME --dataset_name=$DATASET_ID --enable_xformers_memory_efficient_attention --resolution=256 --random_flip --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing --max_train_steps=15000 --checkpointing_steps=5000 --checkpoints_total_limit=1 --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 --conditioning_dropout_prob=0.05 --mixed_precision=fp16 --seed=42

Logs

No response

System Info

diffusers-0.15.0.dev0 python=3.8 torch=2.0.0 accelerate=0.18.0 ubuntu 20.04

patrickvonplaten commented 1 year ago

cc @sayakpaul here

sayakpaul commented 1 year ago

Hi!

Here's what I did:

Cloned diffusers with git clone https://github.com/huggingface/diffusers.
Then I ran the following command:

accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
--dataset_name=sayakpaul/instructpix2pix-1000-samples \
--use_ema \
--enable_xformers_memory_efficient_attention \
--resolution=512 --random_flip \
--train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=20 \
--checkpointing_steps=10 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \
--val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
--validation_prompt="make the mountains snowy" \
--seed=42 \
--report_to=wandb

It worked on both single-GPU and multi-GPU machines. Notice that I didn't specify the --multi-gpu flag while launching training.

I installed diffusers from source by running: pip install git+https://github.com/huggingface/diffusers.

With that, the training actually went fine and I didn't face any issues.

Could you help me reproduce the error?

elricwan commented 1 year ago

Hi!

Thank you for the response. I find out that the problem is with the accelerate config, if you set the accelerate default_config.yaml as, you can reproduce the error. May caused by fsdp.

compute_environment: LOCAL_MACHINE
distributed_type: FSDP
downcast_bf16: 'no'
dynamo_config:
  dynamo_backend: INDUCTOR
  dynamo_mode: default
  dynamo_use_dynamic: true
  dynamo_use_fullgraph: true
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch_policy: BACKWARD_PRE
  fsdp_offload_params: false
  fsdp_sharding_strategy: 2
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: ''
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

When I use the default config, which is:

{
  "compute_environment": "LOCAL_MACHINE",
  "distributed_type": "MULTI_GPU",
  "downcast_bf16": false,
  "machine_rank": 0,
  "main_training_function": "main",
  "mixed_precision": "no",
  "num_machines": 1,
  "num_processes": 4,
  "rdzv_backend": "static",
  "same_network": false,
  "tpu_use_cluster": false,
  "tpu_use_sudo": false,
  "use_cpu": false
}

I got error:

RuntimeError: [3]: params[0] in this process with sizes [320, 4, 3, 3] appears not to match sizes of the same param in process 0.

Then I change the "num_processes" to 1, it runs to the optimization step, but raise the CUDA out of memory error. Even if I change the resolution to 16. Is there anything I can change? I am using a 3090 GPU with 24G memory. BTW, this the the scripts I used:

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_ID="fusing/instructpix2pix-1000-samples"

accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_ID \
--use_ema \
--enable_xformers_memory_efficient_attention \
--resolution=16 --random_flip \
--train_batch_size=1 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=20 \
--checkpointing_steps=10 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \
--val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
--validation_prompt="make the mountains snowy" \
--seed=42 \
--report_to=wandb

sayakpaul commented 1 year ago

Maybe clear all the CUDA cache and restart the process?

sayakpaul commented 1 year ago

You could also disable validation for dealing with lower GPU memory.

Also, maybe try with Torch 1.31.1 if possible?

elricwan commented 1 year ago

I clear all the cache, what accelerate config and gpu do you use?

sayakpaul commented 1 year ago

I am using default accelerate config with a single A100.

elricwan commented 1 year ago

I see, thanks

huggingface / diffusers