How to fine tune checkpoint .safetensor

EnricoBeltramo commented 12 months ago

Describe the bug

I tried to fine tuning a model from a checkpoint (i.e https://civitai.com/models/119202/talmendoxl-sdxl-uncensored-full-model)I converted the checkpoint to diffuser format using this library: https://github.com/waifu-diffusion/sdxl-ckpt-converter/

The model converted works fine for inference and the training script works fine if I use a standard base i.e.: "stabilityai/stable-diffusion-xl-base-1.0", but I have error when start from converted model

Reproduction

download checkpoint: https://civitai.com/models/119202/talmendoxl-sdxl-uncensored-full-model convert using: https://github.com/waifu-diffusion/sdxl-ckpt-converter/ tstart training with: !accelerate launch train_text_to_image_lora_sdxl.py \ --pretrained_model_name_or_path="/content/drive/MyDrive/talmendoxlSDXL_v11Beta" \ --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \ --dataset_name="$INSTANCE_DIR_PARSED" \ --caption_column="text" \ --resolution=1024 \ --train_batch_size=1 \ --num_train_epochs=$TRAIN_EPOCHS \ --checkpointing_steps=1000000 \ --learning_rate=$LEARNING_RATE \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --seed=42 \ --output_dir="$OUTPUT_DIR" \ --enable_xformers_memory_efficient_attention \ --gradient_checkpointing \ --mixed_precision="fp16" \ --use_8bit_adam

Logs

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'clip_sample_range', 'dynamic_thresholding_ratio', 'variance_type', 'thresholding'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 1271, in <module>
    main(args)
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 554, in main
    text_encoder_one = text_encoder_cls_one.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2740, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /content/drive/MyDrive/talmendoxlSDXL_v11Beta.
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 979, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_text_to_image_lora_sdxl.py', '--pretrained_model_name_or_path=/content/drive/MyDrive/talmendoxlSDXL_v11Beta', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--dataset_name=/content/instancefolder_parsed', '--caption_column=text', '--resolution=1024', '--train_batch_size=1', '--num_train_epochs=1', '--checkpointing_steps=1000000', '--learning_rate=2e-05', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--seed=42', '--output_dir=/content/lora-trained-xl-colab', '--enable_xformers_memory_efficient_attention', '--gradient_checkpointing', '--mixed_precision=fp16', '--use_8bit_adam']' returned non-zero exit status 1.

System Info

diffusers version: 0.21.0.dev0
Platform: Linux-5.15.120+-x86_64-with-glibc2.35
Python version: 3.10.12
PyTorch version (GPU?): 2.0.1+cu118 (True)
Huggingface_hub version: 0.17.2
Transformers version: 4.33.2
Accelerate version: 0.21.0
xFormers version: 0.0.21
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@williamberman, @patrickvonplaten, @sayakpau

yuxu915 commented 11 months ago

hi, @EnricoBeltramo , meet the same question, did you finally solve it?

sayakpaul commented 11 months ago

Have you generated the pipeline in the diffusers format from the single checkpoint? If so, could you provide a link to the serialized pipeline?

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers