huggingface / diffusers

šŸ¤— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.41k stars 5.43k forks source link

Dreambooth / transformers cannot execute, missing "MODEL_TYPE" #1540

Closed DrewWalkup closed 1 year ago

DrewWalkup commented 1 year ago

Describe the bug

When attempting to execute dreambooth on any version of transformers >4.21 run into the issue outlined in LOGS with vanilla SDv1.5.

Reproduction

accelerate launch --mixed_precision='fp16' train_dreambooth.py \ --train_text_encoder \ --save_steps=500 \ --pretrained_model_name_or_path=YOUR_VAR \ --instance_data_dir=YOUR_VAR \ --class_data_dir=YOUR_VAR \ --output_dir=YOUR_VAR \ --instance_prompt=YOUR_VAR \ --with_prior_preservation --prior_loss_weight=1.0 \ --seed=YOUR_VAR \ --resolution=YOUR_VAR \ --train_batch_size=1 \ --gradient_accumulation_steps=1 \ --use_8bit_adam \ --learning_rate=YOUR_VAR \ --lr_scheduler="polynomial" \ --lr_warmup_steps=10 \ --max_train_steps=YOUR_VAR

Logs

Traceback (most recent call last):
  File "/workspace/content/diffusers/examples/dreambooth/train_dreambooth.py", line 713, in <module>
    main(args)
  File "/workspace/content/diffusers/examples/dreambooth/train_dreambooth.py", line 436, in main
    tokenizer = AutoTokenizer.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 564, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 775, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in /workspace/content/stable-diffusion-v1-5. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, blenderbot, blenderbot-small, bloom, camembert, canine, clip, codegen, convbert, convnext, ctrl, cvt, data2vec-audio, data2vec-text, data2vec-vision, deberta, deberta-v2, decision_transformer, deit, detr, distilbert, donut-swin, dpr, dpt, electra, encoder-decoder, ernie, flaubert, flava, fnet, fsmt, funnel, glpn, gpt2, gpt_neo, gpt_neox, gptj, groupvit, hubert, ibert, imagegpt, layoutlm, layoutlmv2, layoutlmv3, led, levit, longformer, longt5, luke, lxmert, m2m_100, marian, maskformer, mbart, mctct, megatron-bert, mobilebert, mobilevit, mpnet, mt5, mvp, nezha, nystromformer, openai-gpt, opt, owlvit, pegasus, pegasus_x, perceiver, plbart, poolformer, prophetnet, qdqbert, rag, realm, reformer, regnet, rembert, resnet, retribert, roberta, roformer, segformer, sew, sew-d, speech-encoder-decoder, speech_to_text, speech_to_text_2, splinter, squeezebert, swin, swinv2, t5, tapas, trajectory_transformer, transfo-xl, trocr, unispeech, unispeech-sat, van, videomae, vilt, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_mae, wav2vec2, wav2vec2-conformer, wavlm, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, yolos, yoso

System Info

diffusers version: 0.9.0 Platform: Linux-5.15.65+-x86_64-with-debian-bullseye-sid Python version: 3.7.12 PyTorch version (GPU?): 1.12.0+cu113 (True) Huggingface_hub version: 0.11.1 Transformers version: 4.22.1 Using GPU in script?: P100 Using distributed or parallel set-up in script?: no

patrickvonplaten commented 1 year ago

Hey @dnwalkup,

We don't know what YOUR_VAR is - could you please specify this?

DrewWalkup commented 1 year ago

YOUR_VAR is your variable. Is it beneficial for me to define the output dir for tests on your end?

Some examples below:

accelerate launch --mixed_precision='fp16' train_dreambooth.py --train_text_encoder --save_steps=500 --pretrained_model_name_or_path=path/to/official/stable/diffusion/model/ --instance_data_dir=path/to/your/instance/images/ --class_data_dir=path/to/your/class/images/ --output_dir=path/for/your/new/model/ --instance_prompt="your instance prompt for testing purposes" --with_prior_preservation --prior_loss_weight=1.0 --seed=156516168661 --resolution=512 or 768 depending on which model you're testing with (should test on v1.5, v2-512-base and v2 768) --train_batch_size=1 --gradient_accumulation_steps=1 --use_8bit_adam --learning_rate=1.0e-6 --lr_scheduler="polynomial" --lr_warmup_steps=10 --max_train_steps=750

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.