huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
24.27k stars 5.01k forks source link

Issue running stable diffusion dreambooth on mac m3 max apple silicon. #7498

Open sagargulabani opened 4 months ago

sagargulabani commented 4 months ago

Describe the bug

I am trying to run dreambooth stable diffusion on m3 max. However I am running into an issue because of which whenever I am trying to generate the class images for the concepts, it fails.

Reproduction

To reproduce the errors, try to setup dreambooth extension of m3 max apple silicon. Then try to generate class images. It will fail.

As per this issue, someone suggested us to open an issue in this respository.

Please help us. Thank you.

Logs

400 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 2003, in main
    return inner_loop()
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 380, in inner_loop
    count, instance_prompts, class_prompts = generate_classifiers(
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/utils/gen_utils.py", line 211, in generate_classifiers
    new_images = builder.generate_images(prompts, pbar)
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/helpers/image_builder.py", line 235, in generate_images
    with self.accelerator.autocast(), torch.inference_mode():
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/site-packages/accelerate/accelerator.py", line 2907, in autocast
    autocast_context = get_mixed_precision_context_manager(self.native_amp, cache_enabled=cache_enabled)
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1372, in get_mixed_precision_context_manager
    return torch.autocast(device_type=state.device.type, dtype=torch.float16, cache_enabled=cache_enabled)
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Generating class images 0/1400::   0%|

System Info

Apple M3 Max 30 CPU 40 GPU, 16 inch, 48 GB of RAM. Python version - 3.10.14 diffusers - 0.27.2 transformers - 4.30.2 torch - 2.1.0

Who can help?

@sayakp

tolgacangoz commented 4 months ago

Hi @sagargulabani, Isn't this issue related to Stable Diffusion web UI's sd_dreambooth_extension extension? Did/Could you try diffusers's DreamBooth? Also, see mps related page. But, I guess autocast is not supported yet in mps. They started a PR, but unfortunately, it seems that they abandoned 😞. Nevertheless, I guess there is an ongoing PR here that may be a solution.

sagargulabani commented 4 months ago

yes, that is true. The issue is related to the webui. I tried running training dreambooth sdxl locally and I am running into the following error

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'rescale_betas_zero_snr', 'clip_sample_range', 'variance_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block', 'dropout', 'attention_type'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1964, in <module>
    main(args)
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1167, in main
    unet_lora_config = LoraConfig(
TypeError: LoraConfig.__init__() got an unexpected keyword argument 'use_dora'
Traceback (most recent call last):
  File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.

My peft version is 0.7.0

and this is my command to run

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub
sagargulabani commented 4 months ago

I did run it by removing the dora flag from the script. [here] (https://github.com/huggingface/diffusers/blob/0cc56309454a6db970db0425de790c952f68fc64/examples/dreambooth/train_dreambooth_lora_sdxl.py#L1169) @linoytsaban

After that I ran into the following issue.

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range', 'rescale_betas_zero_snr'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1963, in <module>
    main(args)
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1503, in main
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1263, in prepare
    result = tuple(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1264, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1330, in prepare_model
    autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager
    return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Traceback (most recent call last):
  File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.
sayakpaul commented 4 months ago

You should remove "mixed_precision="fp16"" when using M3. Cc: @bghira

sayakpaul commented 4 months ago

And yes https://github.com/huggingface/diffusers/pull/7447 should be helpful.

sagargulabani commented 4 months ago

Hi @sayakpaul ,

I did remove that and run but it looks like the code gets stuck

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
03/28/2024 20:06:58 - INFO - __main__ - ***** Running training *****
03/28/2024 20:06:58 - INFO - __main__ -   Num examples = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num batches each epoch = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num Epochs = 250
03/28/2024 20:06:58 - INFO - __main__ -   Instantaneous batch size per device = 1
03/28/2024 20:06:58 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
03/28/2024 20:06:58 - INFO - __main__ -   Gradient Accumulation steps = 4
03/28/2024 20:06:58 - INFO - __main__ -   Total optimization steps = 500
Steps:   0%|                                                                                                                            | 0/500 [00:00<?, ?it/s]

Its not progressing beyond this. I am using an m3 max with 48 GB of RAM.

also I had to remove use_dora flag https://github.com/huggingface/diffusers/blob/0cc56309454a6db970db0425de790c952f68fc64/examples/dreambooth/train_dreambooth_lora_sdxl.py#L1169 from here to run the script.

sagargulabani commented 4 months ago

So I figure that is moving, but it is extremely extremely slow.

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub
03/28/2024 20:06:39 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
03/28/2024 20:06:58 - INFO - __main__ - ***** Running training *****
03/28/2024 20:06:58 - INFO - __main__ -   Num examples = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num batches each epoch = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num Epochs = 250
03/28/2024 20:06:58 - INFO - __main__ -   Instantaneous batch size per device = 1
03/28/2024 20:06:58 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
03/28/2024 20:06:58 - INFO - __main__ -   Gradient Accumulation steps = 4
03/28/2024 20:06:58 - INFO - __main__ -   Total optimization steps = 500
Steps:   0%|▏                                                                                       | 1/500 [10:04<83:43:30, 604.03s/it, loss=0.0871, lr=0.0001]

Any suggestions to make it faster.

bghira commented 3 months ago

are you on 14.4? i've been using pytorch 2.2 and i get about 10 seconds per step with 1 megapixel images on a M3 Max 128G. do you observe any memory / swap pressure?

bghira commented 3 months ago

also, in my environment, i've been running with --mixed_precision=fp16 but i'm not sure why that's erroring out for you the way it is.

the code only returns an error to the user when mixed_precision="bf16", informing them to use fp16 instead. the default is actually fp32, which seems to be in use here hence the extreme slowdown.

the goal should be to ensure that mixed_precision=fp16 works on mps.

the relevant section from the linked PR:

    # Some configurations require autocast to be disabled.
    enable_autocast = True
    if torch.backends.mps.is_available() or (
        accelerator.mixed_precision == "fp16" or accelerator.mixed_precision == "bf16"
    ):
        enable_autocast = False

disables autocast on MPS.

wasn't sure whether the initial report included that PR or not. if it didn't, could you re-attempt with --mixed_precision=fp16

sagargulabani commented 3 months ago

so this is the error I see when I run in it with mixed precision fp16

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision=fp16 \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub
/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
03/29/2024 09:18:34 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'rescale_betas_zero_snr', 'variance_type', 'clip_sample_range', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'attention_type', 'dropout', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/Users/sagargulabani/dev/huggingface-transformers/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1985, in <module>
    main(args)
  File "/Users/sagargulabani/dev/huggingface-transformers/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1525, in main
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1263, in prepare
    result = tuple(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1264, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1330, in prepare_model
    autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager
    return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Traceback (most recent call last):
  File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.

yes I am on Mac OS Sonoma 14.4, just upgraded it.

sagargulabani commented 3 months ago

When I run the code without mixed precision fp16, these are the screenshots of what I see in my activity monitor, in htop and asitop. I see that the GPU is not being utilized a lot.

Screenshot 2024-03-28 at 8 39 59 AM Screenshot 2024-03-29 at 9 22 45 AM
bghira commented 3 months ago

are you running the latest main branch?

sagargulabani commented 3 months ago

yes, I took a pull yesterday.

I also took a pull right now - 34c90dbb31f4956e72086ac43ca7d8154c2aadae (this is the commit)

did run pip install -e ..

and after that also getting the same error.

This is what my pip list command for diffusers shows. diffusers 0.28.0.dev0 /Users/sagargulabani/dev/huggingface-transformers/diffusers

bghira commented 3 months ago

7530 might fix this one @sagargulabani

sagargulabani commented 3 months ago

Hi @bghira I checked out to this commit - https://github.com/bghira/diffusers/commit/ad3eb800385d0247faf0c59d9aed731f780a4318

and tried to run the same command above with the same script - train_dreambooth_lora_sdxl.py but still running into the same issue -

RuntimeError: User specified an unsupported autocast device_type 'mps'

bghira commented 3 months ago

@sagargulabani i've updated that script in particular for that PR. it now uses native_amp = False in the Accelerator config.

can you please re-run with that change? i will put it to the rest of the scripts after

akospalfi commented 3 months ago

@bghira I've been having the same problem as @sagargulabani and your new changes with explicity disabling native amp leads to a different error type:

loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0)): error: input types 'tensor<2x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Traceback (most recent call last):
  File "/Users/palfia/jax-metal/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
    simple_launcher(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

edit: script parameters (pytorch 2.2.2)

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a sks dog" \
  --class_prompt="a dog" \
  --mixed_precision=fp16 \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=100 \
  --max_train_steps=800
bghira commented 3 months ago

was there more to the traceback before that one? that's the traceback from Accelerate, but the one from the trainer is needed to know where this error originated. i believe it's in log_validations where the dtypes change. this is something i saw when also updating to pytorch 2.2 latest.

i'm really hoping we don't have to run .to() on all of the embeds.

bghira commented 3 months ago

@sayakpaul i think i'm in a bit of a need of rescuing on this issue. do you have an ideas how to proceed? maybe a dummycast wrapper in train utils as i mentioned last week? the dtypes have to be the same everywhere for MPS.

akospalfi commented 3 months ago

was there more to the traceback before that one? that's the traceback from Accelerate, but the one from the trainer is needed to know where this error originated. i believe it's in log_validations where the dtypes change. this is something i saw when also updating to pytorch 2.2 latest.

i'm really hoping we don't have to run .to() on all of the embeds.

This is the full log, I can't see anything more useful:

/Users/palfia/jax-metal/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
/Users/palfia/jax-metal/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
/Users/palfia/jax-metal/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
04/01/2024 17:50:18 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'sample_max_value', 'thresholding'} was not found in config. Values will be initialized to default values.
04/01/2024 17:50:20 - INFO - __main__ - ***** Running training *****
04/01/2024 17:50:20 - INFO - __main__ -   Num examples = 100
04/01/2024 17:50:20 - INFO - __main__ -   Num batches each epoch = 100
04/01/2024 17:50:20 - INFO - __main__ -   Num Epochs = 8
04/01/2024 17:50:20 - INFO - __main__ -   Instantaneous batch size per device = 1
04/01/2024 17:50:20 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
04/01/2024 17:50:20 - INFO - __main__ -   Gradient Accumulation steps = 1
04/01/2024 17:50:20 - INFO - __main__ -   Total optimization steps = 800
Steps:   0%|                                                                                                                                                                                  | 0/800 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0)): error: input types 'tensor<2x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Traceback (most recent call last):
  File "/Users/palfia/jax-metal/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
    simple_launcher(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/palfia/jax-metal/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=/Users/palfia/fun/converted_dreamshaper_v8', '--instance_data_dir=/Users/palfia/fun/train_db/J/instance_images/prepared', '--class_data_dir=/Users/palfia/fun/train_db/J/class_images', '--output_dir=/Users/palfia/fun/dreambooth_models', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a sks dog', '--class_prompt=a dog', '--mixed_precision=fp16', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=100', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.
sagargulabani commented 3 months ago

Hi @bghira, I also see the same error as @akospalfi

bghira commented 3 months ago

i'm able to reproduce this one locally, but it's not clear why it's happening. the text encoder hidden states are fp16, the noisy inputs are fp16.

i can train locally on SimpleTuner, which handles dtypes differently, but it's not clear which difference is causing this problem.

sagargulabani commented 3 months ago

Hi @bghira @sayakpaul Just following up on this one on how we could go about it.

bghira commented 3 months ago

it's been complicated to do in a non-invasive way for the diffusers project.

for now, i've been running dreambooth via simpletuner for the last few days successfully, introducing single subjects via these config values on pytorch 2.4 nightly.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bghira commented 2 months ago

not stale, just waiting on some pytorch improvements

sagargulabani commented 3 weeks ago

we can close this now that pytorch supports mps ?

sayakpaul commented 3 weeks ago

Have you verified if it runs successfully?

sagargulabani commented 3 weeks ago

no I haven't verified it. Will verify and let you know.

bghira commented 3 weeks ago

well, no. it's not even in a release yet :-)

bghira commented 3 weeks ago

and it was now reverted out of pytorch/main due to regressions :[