Open sagargulabani opened 4 months ago
Hi @sagargulabani,
Isn't this issue related to Stable Diffusion web UI's sd_dreambooth_extension extension?
Did/Could you try diffusers
's DreamBooth? Also, see mps related page.
But, I guess autocast is not supported yet in mps
. They started a PR, but unfortunately, it seems that they abandoned 😞. Nevertheless, I guess there is an ongoing PR here that may be a solution.
yes, that is true. The issue is related to the webui. I tried running training dreambooth sdxl locally and I am running into the following error
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'rescale_betas_zero_snr', 'clip_sample_range', 'variance_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block', 'dropout', 'attention_type'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1964, in <module>
main(args)
File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1167, in main
unet_lora_config = LoraConfig(
TypeError: LoraConfig.__init__() got an unexpected keyword argument 'use_dora'
Traceback (most recent call last):
File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.
My peft version is 0.7.0
and this is my command to run
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--pretrained_vae_model_name_or_path=$VAE_PATH \
--output_dir=$OUTPUT_DIR \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
I did run it by removing the dora flag from the script. [here] (https://github.com/huggingface/diffusers/blob/0cc56309454a6db970db0425de790c952f68fc64/examples/dreambooth/train_dreambooth_lora_sdxl.py#L1169) @linoytsaban
After that I ran into the following issue.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range', 'rescale_betas_zero_snr'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1963, in <module>
main(args)
File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1503, in main
unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1263, in prepare
result = tuple(
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1264, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1330, in prepare_model
autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager
return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Traceback (most recent call last):
File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.
You should remove "mixed_precision="fp16"" when using M3. Cc: @bghira
And yes https://github.com/huggingface/diffusers/pull/7447 should be helpful.
Hi @sayakpaul ,
I did remove that and run but it looks like the code gets stuck
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
03/28/2024 20:06:58 - INFO - __main__ - ***** Running training *****
03/28/2024 20:06:58 - INFO - __main__ - Num examples = 5
03/28/2024 20:06:58 - INFO - __main__ - Num batches each epoch = 5
03/28/2024 20:06:58 - INFO - __main__ - Num Epochs = 250
03/28/2024 20:06:58 - INFO - __main__ - Instantaneous batch size per device = 1
03/28/2024 20:06:58 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
03/28/2024 20:06:58 - INFO - __main__ - Gradient Accumulation steps = 4
03/28/2024 20:06:58 - INFO - __main__ - Total optimization steps = 500
Steps: 0%| | 0/500 [00:00<?, ?it/s]
Its not progressing beyond this. I am using an m3 max with 48 GB of RAM.
also I had to remove use_dora
flag https://github.com/huggingface/diffusers/blob/0cc56309454a6db970db0425de790c952f68fc64/examples/dreambooth/train_dreambooth_lora_sdxl.py#L1169 from here to run the script.
So I figure that is moving, but it is extremely extremely slow.
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--pretrained_vae_model_name_or_path=$VAE_PATH \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
03/28/2024 20:06:39 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps
Mixed precision type: no
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
03/28/2024 20:06:58 - INFO - __main__ - ***** Running training *****
03/28/2024 20:06:58 - INFO - __main__ - Num examples = 5
03/28/2024 20:06:58 - INFO - __main__ - Num batches each epoch = 5
03/28/2024 20:06:58 - INFO - __main__ - Num Epochs = 250
03/28/2024 20:06:58 - INFO - __main__ - Instantaneous batch size per device = 1
03/28/2024 20:06:58 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
03/28/2024 20:06:58 - INFO - __main__ - Gradient Accumulation steps = 4
03/28/2024 20:06:58 - INFO - __main__ - Total optimization steps = 500
Steps: 0%|▏ | 1/500 [10:04<83:43:30, 604.03s/it, loss=0.0871, lr=0.0001]
Any suggestions to make it faster.
are you on 14.4? i've been using pytorch 2.2 and i get about 10 seconds per step with 1 megapixel images on a M3 Max 128G. do you observe any memory / swap pressure?
also, in my environment, i've been running with --mixed_precision=fp16
but i'm not sure why that's erroring out for you the way it is.
the code only returns an error to the user when mixed_precision="bf16", informing them to use fp16 instead. the default is actually fp32, which seems to be in use here hence the extreme slowdown.
the goal should be to ensure that mixed_precision=fp16 works on mps.
the relevant section from the linked PR:
# Some configurations require autocast to be disabled.
enable_autocast = True
if torch.backends.mps.is_available() or (
accelerator.mixed_precision == "fp16" or accelerator.mixed_precision == "bf16"
):
enable_autocast = False
disables autocast on MPS.
wasn't sure whether the initial report included that PR or not. if it didn't, could you re-attempt with --mixed_precision=fp16
so this is the error I see when I run in it with mixed precision fp16
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--pretrained_vae_model_name_or_path=$VAE_PATH \
--output_dir=$OUTPUT_DIR \
--mixed_precision=fp16 \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
03/29/2024 09:18:34 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps
Mixed precision type: fp16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'rescale_betas_zero_snr', 'variance_type', 'clip_sample_range', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'attention_type', 'dropout', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/Users/sagargulabani/dev/huggingface-transformers/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1985, in <module>
main(args)
File "/Users/sagargulabani/dev/huggingface-transformers/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1525, in main
unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1263, in prepare
result = tuple(
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1264, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1330, in prepare_model
autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager
return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Traceback (most recent call last):
File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.
yes I am on Mac OS Sonoma 14.4, just upgraded it.
When I run the code without mixed precision fp16, these are the screenshots of what I see in my activity monitor, in htop and asitop. I see that the GPU is not being utilized a lot.
are you running the latest main branch?
yes, I took a pull yesterday.
I also took a pull right now - 34c90dbb31f4956e72086ac43ca7d8154c2aadae (this is the commit)
did run pip install -e .
.
and after that also getting the same error.
This is what my pip list command for diffusers shows.
diffusers 0.28.0.dev0 /Users/sagargulabani/dev/huggingface-transformers/diffusers
Hi @bghira I checked out to this commit - https://github.com/bghira/diffusers/commit/ad3eb800385d0247faf0c59d9aed731f780a4318
and tried to run the same command above with the same script - train_dreambooth_lora_sdxl.py but still running into the same issue -
RuntimeError: User specified an unsupported autocast device_type 'mps'
@sagargulabani i've updated that script in particular for that PR. it now uses native_amp = False in the Accelerator config.
can you please re-run with that change? i will put it to the rest of the scripts after
@bghira I've been having the same problem as @sagargulabani and your new changes with explicity disabling native amp leads to a different error type:
loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0)): error: input types 'tensor<2x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Traceback (most recent call last):
File "/Users/palfia/jax-metal/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
simple_launcher(args)
File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
edit: script parameters (pytorch 2.2.2)
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a sks dog" \
--class_prompt="a dog" \
--mixed_precision=fp16 \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=100 \
--max_train_steps=800
was there more to the traceback before that one? that's the traceback from Accelerate, but the one from the trainer is needed to know where this error originated. i believe it's in log_validations where the dtypes change. this is something i saw when also updating to pytorch 2.2 latest.
i'm really hoping we don't have to run .to()
on all of the embeds.
@sayakpaul i think i'm in a bit of a need of rescuing on this issue. do you have an ideas how to proceed? maybe a dummycast wrapper in train utils as i mentioned last week? the dtypes have to be the same everywhere for MPS.
was there more to the traceback before that one? that's the traceback from Accelerate, but the one from the trainer is needed to know where this error originated. i believe it's in log_validations where the dtypes change. this is something i saw when also updating to pytorch 2.2 latest.
i'm really hoping we don't have to run
.to()
on all of the embeds.
This is the full log, I can't see anything more useful:
/Users/palfia/jax-metal/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
/Users/palfia/jax-metal/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
/Users/palfia/jax-metal/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
04/01/2024 17:50:18 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps
Mixed precision type: fp16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'sample_max_value', 'thresholding'} was not found in config. Values will be initialized to default values.
04/01/2024 17:50:20 - INFO - __main__ - ***** Running training *****
04/01/2024 17:50:20 - INFO - __main__ - Num examples = 100
04/01/2024 17:50:20 - INFO - __main__ - Num batches each epoch = 100
04/01/2024 17:50:20 - INFO - __main__ - Num Epochs = 8
04/01/2024 17:50:20 - INFO - __main__ - Instantaneous batch size per device = 1
04/01/2024 17:50:20 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 1
04/01/2024 17:50:20 - INFO - __main__ - Gradient Accumulation steps = 1
04/01/2024 17:50:20 - INFO - __main__ - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0)): error: input types 'tensor<2x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Traceback (most recent call last):
File "/Users/palfia/jax-metal/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
simple_launcher(args)
File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/palfia/jax-metal/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=/Users/palfia/fun/converted_dreamshaper_v8', '--instance_data_dir=/Users/palfia/fun/train_db/J/instance_images/prepared', '--class_data_dir=/Users/palfia/fun/train_db/J/class_images', '--output_dir=/Users/palfia/fun/dreambooth_models', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a sks dog', '--class_prompt=a dog', '--mixed_precision=fp16', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=100', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.
Hi @bghira, I also see the same error as @akospalfi
i'm able to reproduce this one locally, but it's not clear why it's happening. the text encoder hidden states are fp16, the noisy inputs are fp16.
i can train locally on SimpleTuner, which handles dtypes differently, but it's not clear which difference is causing this problem.
Hi @bghira @sayakpaul Just following up on this one on how we could go about it.
it's been complicated to do in a non-invasive way for the diffusers project.
for now, i've been running dreambooth via simpletuner for the last few days successfully, introducing single subjects via these config values on pytorch 2.4 nightly.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
not stale, just waiting on some pytorch improvements
we can close this now that pytorch supports mps ?
Have you verified if it runs successfully?
no I haven't verified it. Will verify and let you know.
well, no. it's not even in a release yet :-)
and it was now reverted out of pytorch/main due to regressions :[
Describe the bug
I am trying to run dreambooth stable diffusion on m3 max. However I am running into an issue because of which whenever I am trying to generate the class images for the concepts, it fails.
Reproduction
To reproduce the errors, try to setup dreambooth extension of m3 max apple silicon. Then try to generate class images. It will fail.
As per this issue, someone suggested us to open an issue in this respository.
Please help us. Thank you.
Logs
System Info
Apple M3 Max 30 CPU 40 GPU, 16 inch, 48 GB of RAM. Python version - 3.10.14 diffusers - 0.27.2 transformers - 4.30.2 torch - 2.1.0
Who can help?
@sayakp