huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.49k stars 5.28k forks source link

Unable to locate model_index.json and config.json while running the text_to_image/train_text_to_image.py finetuning pokemon example. #3694

Closed benjaminwfriedman closed 1 year ago

benjaminwfriedman commented 1 year ago

Describe the bug

Using a Google Colab notebook I ran the steps of the text_to_image fine-tuning example using the pokemon data provided. I successfully fine-tuned the model for 500 steps and see the checkpoint-500 output in my directory. When I attempt to follow the readme and run:

from diffusers import StableDiffusionPipeline
model_path = "diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500"

pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")

image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")`

I receive an OSError: OSError: Error no file named model_index.json found in directory

After some research, it seemed like that error was caused by not getting the model_index.json that is in the CompVis/stable-diffusion-v1-4 model card. After downloading that model_index.json and adding it to my checkpoint, I no longer received the error, but received another OSError: OSError: /content/diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500 does not appear to have a file named config.json.

I am not sure how to proceed on this one. It seems to me, given the documentation in the readme that both of these files should be present in the checkpoint so that it can be read back into the StableDiffusionPipeline from the checkpoint directory without error.

Any guidance or education would be much appreciated.

Thanks!

Reproduction

git clone https://github.com/huggingface/diffusers.git cd diffusers pip install .

cd examples/text_to_image pip install -r requirements.txt

huggingface-cli login

accelerate launch --mixed_precision="fp16" train_text_to_image.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --dataset_name="lambdalabs/pokemon-blip-captions" --use_ema --resolution=512 --center_crop --random_flip --train_batch_size=1 --gradient_accumulation_steps=4 --gradient_checkpointing --max_train_steps=500 --learning_rate=1e-05 --max_grad_norm=1 --lr_scheduler="constant" --lr_warmup_steps=0 --output_dir="sd-pokemon-model2"

import torch from diffusers import StableDiffusionPipeline

model_path = "{path_to_checkpoints}/checkpoint-500" pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16) pipe.to("cuda")

image = pipe(prompt="yoda").images[0] image.save("yoda-pokemon.png")

Logs

in <cell line: 4>:4                                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py:927 in             │
│ from_pretrained                                                                                  │
│                                                                                                  │
│    924 │   │   else:                                                                             │
│    925 │   │   │   cached_folder = pretrained_model_name_or_path                                 │
│    926 │   │                                                                                     │
│ ❱  927 │   │   config_dict = cls.load_config(cached_folder)                                      │
│    928 │   │                                                                                     │
│    929 │   │   # pop out "_ignore_files" as it is only needed for download                       │
│    930 │   │   config_dict.pop("_ignore_files", None)                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py:352 in load_config      │
│                                                                                                  │
│   349 │   │   │   ):                                                                             │
│   350 │   │   │   │   config_file = os.path.join(pretrained_model_name_or_path, subfolder, cls   │
│   351 │   │   │   else:                                                                          │
│ ❱ 352 │   │   │   │   raise EnvironmentError(                                                    │
│   353 │   │   │   │   │   f"Error no file named {cls.config_name} found in directory {pretrain   │
│   354 │   │   │   │   )                                                                          │
│   355 │   │   else:                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: Error no file named model_index.json found in directory 
/content/diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500.

System Info

patrickvonplaten commented 1 year ago

Gentle ping @sayakpaul here

sayakpaul commented 1 year ago

If possible could you upload the checkpoints on Hub so that I can take a look?

Meanwhile, here are some pointers:

Assuming you only fine-tuned the UNet, could you try doing the following:

from diffusers import UNet2DConditionModel

unet = UNet2DConditionModel.from_pretrained("you-checkpoint-path")
pipeline.unet = unet

@patrickvonplaten could this be an issue with how we create the checkpoints with accelerate?

miladnoori1996 commented 1 year ago

@sayakpaul I'm having the same issue. Here is the save path directory hierarchy:

(diffusers) ➜  textual_inversion_cat git:(main) ✗ ls *
learned_embeds-steps-1000.bin  learned_embeds-steps-1500.bin  learned_embeds-steps-500.bin

checkpoint-1000:
optimizer.bin  pytorch_model.bin  random_states_0.pkl  scheduler.bin

checkpoint-1500:
optimizer.bin  pytorch_model.bin  random_states_0.pkl  scheduler.bin

checkpoint-500:
optimizer.bin  pytorch_model.bin  random_states_0.pkl  scheduler.bin

logs:
textual_inversion

when trying to only load the unit I still get the following error:

(diffusers) ➜  textual_inversion_cat git:(main) ✗ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: ls
checkpoint-1000/  checkpoint-1500/  checkpoint-2000/  checkpoint-500/  learned_embeds-steps-1000.bin  learned_embeds-steps-1500.bin  learned_embeds-steps-2000.bin  learned_embeds-steps-500.bin  logs/

In [2]: from diffusers import UNet2DConditionModel

In [3]: from diffusers import StableDiffusionPipeline

In [4]: import torch

In [5]: pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16)
safety_checker/model.safetensors not found
/home/analog/anaconda3/envs/diffusers/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.

In [6]: unet = UNet2DConditionModel.from_pretrained("./checkpoint-1500")
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[6], line 1
----> 1 unet = UNet2DConditionModel.from_pretrained("./checkpoint-1500")

File ~/anaconda3/envs/diffusers/lib/python3.8/site-packages/diffusers/models/modeling_utils.py:514, in ModelMixin.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    507 user_agent = {
    508     "diffusers": __version__,
    509     "file_type": "model",
    510     "framework": "pytorch",
    511 }
    513 # load config
--> 514 config, unused_kwargs, commit_hash = cls.load_config(
    515     config_path,
    516     cache_dir=cache_dir,
    517     return_unused_kwargs=True,
    518     return_commit_hash=True,
    519     force_download=force_download,
    520     resume_download=resume_download,
    521     proxies=proxies,
    522     local_files_only=local_files_only,
    523     use_auth_token=use_auth_token,
    524     revision=revision,
    525     subfolder=subfolder,
    526     device_map=device_map,
    527     max_memory=max_memory,
    528     offload_folder=offload_folder,
    529     offload_state_dict=offload_state_dict,
    530     user_agent=user_agent,
    531     **kwargs,
    532 )
    534 # load model
    535 model_file = None

File ~/anaconda3/envs/diffusers/lib/python3.8/site-packages/diffusers/configuration_utils.py:352, in ConfigMixin.load_config(cls, pretrained_model_name_or_path, return_unused_kwargs, return_commit_hash, **kwargs)
    350         config_file = os.path.join(pretrained_model_name_or_path, subfolder, cls.config_name)
    351     else:
--> 352         raise EnvironmentError(
    353             f"Error no file named {cls.config_name} found in directory {pretrained_model_name_or_path}."
    354         )
    355 else:
    356     try:
    357         # Load from URL or cache if already cached

OSError: Error no file named config.json found in directory ./checkpoint-1500.

then looking at the model that I downloaded from hugging face

(diffusers) ➜  textual_inversion_cat git:(main) ✗ find /home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/* -name "config.json"
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/safety_checker/config.json
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/text_encoder/config.json
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/unet/config.json
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/vae/config.json

is there something I'm doing wrong?

sayakpaul commented 1 year ago

@williamberman do you have any insights to share here?

Also, @miladnoori1996 @benjaminwfriedman is it possible to ensure you're using the latest versions of the example scripts? Because we introduced a few modifications to those scripts as far as checkpointing is concerned.

miladnoori1996 commented 1 year ago

@sayakpaul I'm only facing this issue loading the model from check point, it works fine when fine tuning is complete. I'm guessing this is related to the other thread: https://github.com/huggingface/diffusers/issues/2716 (btw I did pull yesterday)

Guess the question is how to load torch pytorch_model.bin moddel into UNet2DConditionModel

In [40]: pipe.unet.load_attn_procs(model_id, subfolder="checkpoint-1000", weight_name="pytorch_model.bin")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[40], line 1
----> 1 pipe.unet.load_attn_procs(model_id, subfolder="checkpoint-1000", weight_name="pytorch_model.bin")

File ~/anaconda3/envs/diffusers/lib/python3.8/site-packages/diffusers/loaders.py:332, in UNet2DConditionLoadersMixin.load_attn_procs(self, pretrained_model_name_or_path_or_dict, **kwargs)
    330             attn_processors[key].load_state_dict(value_dict)
    331 else:
--> 332     raise ValueError(
    333         f"{model_file} does not seem to be in the correct format expected by LoRA or Custom Diffusion training."
    334     )
    336 # set correct dtype & device
    337 attn_processors = {k: v.to(device=self.device, dtype=self.dtype) for k, v in attn_processors.items()}

ValueError: /home/analog/Desktop/fastAI/diffusers/examples/textual_inversion/textual_inversion_nude_woman_posing/checkpoint-1000/pytorch_model.bin does not seem to be in the correct format expected by LoRA or Custom Diffusion training.
sayakpaul commented 1 year ago

I thought you were using the train_text_to_image.py script and not the LoRA script because your logs here suggest you are using LoRA.

williamberman commented 1 year ago

The problem here with the snippet is that at the end of the training run the whole pipeline is serialized but in the checkpoints only the unet is serialized so

from diffusers import StableDiffusionPipeline
model_path = "diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500"

pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")

image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")`

should really be

from diffusers import StableDiffusionPipeline, UNet2DConditionModel
model_path = "diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500"

unet = UNet2DConditionModel.from_pretrained(model_path + "/unet")
pipe = StableDiffusionPipeline.from_pretrained("<initial model>", unet=unet, torch_dtype=torch.float16)
pipe.to("cuda")

image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")`
sayakpaul commented 1 year ago

Hi @benjaminwfriedman I think @williamberman's comment above fixes the issue. Additionally, with https://github.com/huggingface/diffusers/pull/3806, it should be also pretty clear from the docs now.

So, I am closing the issue now. Feel free to reopen if you face any problems.

BurgerAndreas commented 8 months ago

Note that if you are using LoRA you need to load a checkpoint like this:

model_path = 'sd-pokemon-model-lora/checkpoint-500'
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")