Closed benjaminwfriedman closed 1 year ago
Gentle ping @sayakpaul here
If possible could you upload the checkpoints on Hub so that I can take a look?
Meanwhile, here are some pointers:
Assuming you only fine-tuned the UNet, could you try doing the following:
from diffusers import UNet2DConditionModel
unet = UNet2DConditionModel.from_pretrained("you-checkpoint-path")
pipeline.unet = unet
@patrickvonplaten could this be an issue with how we create the checkpoints with accelerate
?
@sayakpaul I'm having the same issue. Here is the save path directory hierarchy:
(diffusers) ➜ textual_inversion_cat git:(main) ✗ ls *
learned_embeds-steps-1000.bin learned_embeds-steps-1500.bin learned_embeds-steps-500.bin
checkpoint-1000:
optimizer.bin pytorch_model.bin random_states_0.pkl scheduler.bin
checkpoint-1500:
optimizer.bin pytorch_model.bin random_states_0.pkl scheduler.bin
checkpoint-500:
optimizer.bin pytorch_model.bin random_states_0.pkl scheduler.bin
logs:
textual_inversion
when trying to only load the unit I still get the following error:
(diffusers) ➜ textual_inversion_cat git:(main) ✗ ipython
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: ls
checkpoint-1000/ checkpoint-1500/ checkpoint-2000/ checkpoint-500/ learned_embeds-steps-1000.bin learned_embeds-steps-1500.bin learned_embeds-steps-2000.bin learned_embeds-steps-500.bin logs/
In [2]: from diffusers import UNet2DConditionModel
In [3]: from diffusers import StableDiffusionPipeline
In [4]: import torch
In [5]: pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16)
safety_checker/model.safetensors not found
/home/analog/anaconda3/envs/diffusers/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
In [6]: unet = UNet2DConditionModel.from_pretrained("./checkpoint-1500")
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[6], line 1
----> 1 unet = UNet2DConditionModel.from_pretrained("./checkpoint-1500")
File ~/anaconda3/envs/diffusers/lib/python3.8/site-packages/diffusers/models/modeling_utils.py:514, in ModelMixin.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
507 user_agent = {
508 "diffusers": __version__,
509 "file_type": "model",
510 "framework": "pytorch",
511 }
513 # load config
--> 514 config, unused_kwargs, commit_hash = cls.load_config(
515 config_path,
516 cache_dir=cache_dir,
517 return_unused_kwargs=True,
518 return_commit_hash=True,
519 force_download=force_download,
520 resume_download=resume_download,
521 proxies=proxies,
522 local_files_only=local_files_only,
523 use_auth_token=use_auth_token,
524 revision=revision,
525 subfolder=subfolder,
526 device_map=device_map,
527 max_memory=max_memory,
528 offload_folder=offload_folder,
529 offload_state_dict=offload_state_dict,
530 user_agent=user_agent,
531 **kwargs,
532 )
534 # load model
535 model_file = None
File ~/anaconda3/envs/diffusers/lib/python3.8/site-packages/diffusers/configuration_utils.py:352, in ConfigMixin.load_config(cls, pretrained_model_name_or_path, return_unused_kwargs, return_commit_hash, **kwargs)
350 config_file = os.path.join(pretrained_model_name_or_path, subfolder, cls.config_name)
351 else:
--> 352 raise EnvironmentError(
353 f"Error no file named {cls.config_name} found in directory {pretrained_model_name_or_path}."
354 )
355 else:
356 try:
357 # Load from URL or cache if already cached
OSError: Error no file named config.json found in directory ./checkpoint-1500.
then looking at the model that I downloaded from hugging face
(diffusers) ➜ textual_inversion_cat git:(main) ✗ find /home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/* -name "config.json"
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/safety_checker/config.json
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/text_encoder/config.json
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/unet/config.json
/home/analog/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/ded79e214aa69e42c24d3f5ac14b76d568679cc2/vae/config.json
is there something I'm doing wrong?
@williamberman do you have any insights to share here?
Also, @miladnoori1996 @benjaminwfriedman is it possible to ensure you're using the latest versions of the example scripts? Because we introduced a few modifications to those scripts as far as checkpointing is concerned.
@sayakpaul I'm only facing this issue loading the model from check point, it works fine when fine tuning is complete. I'm guessing this is related to the other thread: https://github.com/huggingface/diffusers/issues/2716 (btw I did pull yesterday)
Guess the question is how to load torch pytorch_model.bin moddel into UNet2DConditionModel
In [40]: pipe.unet.load_attn_procs(model_id, subfolder="checkpoint-1000", weight_name="pytorch_model.bin")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[40], line 1
----> 1 pipe.unet.load_attn_procs(model_id, subfolder="checkpoint-1000", weight_name="pytorch_model.bin")
File ~/anaconda3/envs/diffusers/lib/python3.8/site-packages/diffusers/loaders.py:332, in UNet2DConditionLoadersMixin.load_attn_procs(self, pretrained_model_name_or_path_or_dict, **kwargs)
330 attn_processors[key].load_state_dict(value_dict)
331 else:
--> 332 raise ValueError(
333 f"{model_file} does not seem to be in the correct format expected by LoRA or Custom Diffusion training."
334 )
336 # set correct dtype & device
337 attn_processors = {k: v.to(device=self.device, dtype=self.dtype) for k, v in attn_processors.items()}
ValueError: /home/analog/Desktop/fastAI/diffusers/examples/textual_inversion/textual_inversion_nude_woman_posing/checkpoint-1000/pytorch_model.bin does not seem to be in the correct format expected by LoRA or Custom Diffusion training.
I thought you were using the train_text_to_image.py
script and not the LoRA script because your logs here suggest you are using LoRA.
The problem here with the snippet is that at the end of the training run the whole pipeline is serialized but in the checkpoints only the unet is serialized so
from diffusers import StableDiffusionPipeline
model_path = "diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")`
should really be
from diffusers import StableDiffusionPipeline, UNet2DConditionModel
model_path = "diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500"
unet = UNet2DConditionModel.from_pretrained(model_path + "/unet")
pipe = StableDiffusionPipeline.from_pretrained("<initial model>", unet=unet, torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")`
Hi @benjaminwfriedman I think @williamberman's comment above fixes the issue. Additionally, with https://github.com/huggingface/diffusers/pull/3806, it should be also pretty clear from the docs now.
So, I am closing the issue now. Feel free to reopen if you face any problems.
Note that if you are using LoRA you need to load a checkpoint like this:
model_path = 'sd-pokemon-model-lora/checkpoint-500'
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
Describe the bug
Using a Google Colab notebook I ran the steps of the text_to_image fine-tuning example using the pokemon data provided. I successfully fine-tuned the model for 500 steps and see the checkpoint-500 output in my directory. When I attempt to follow the readme and run:
I receive an OSError:
OSError: Error no file named model_index.json found in directory
After some research, it seemed like that error was caused by not getting the model_index.json that is in the CompVis/stable-diffusion-v1-4 model card. After downloading that model_index.json and adding it to my checkpoint, I no longer received the error, but received another OSError:
OSError: /content/diffusers/examples/text_to_image/sd-pokemon-model2/checkpoint-500 does not appear to have a file named config.json.
I am not sure how to proceed on this one. It seems to me, given the documentation in the readme that both of these files should be present in the checkpoint so that it can be read back into the
StableDiffusionPipeline
from the checkpoint directory without error.Any guidance or education would be much appreciated.
Thanks!
Reproduction
git clone https://github.com/huggingface/diffusers.git cd diffusers pip install .
cd examples/text_to_image pip install -r requirements.txt
huggingface-cli login
accelerate launch --mixed_precision="fp16" train_text_to_image.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --dataset_name="lambdalabs/pokemon-blip-captions" --use_ema --resolution=512 --center_crop --random_flip --train_batch_size=1 --gradient_accumulation_steps=4 --gradient_checkpointing --max_train_steps=500 --learning_rate=1e-05 --max_grad_norm=1 --lr_scheduler="constant" --lr_warmup_steps=0 --output_dir="sd-pokemon-model2"
import torch from diffusers import StableDiffusionPipeline
model_path = "{path_to_checkpoints}/checkpoint-500" pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16) pipe.to("cuda")
image = pipe(prompt="yoda").images[0] image.save("yoda-pokemon.png")
Logs
System Info
diffusers
version: 0.17.0.dev0