huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.47k stars 5.27k forks source link

Loading from_pretrained from subfolder with a path to directory raises ValueError: is not a folder containing a `.index.json` file or a pytorch_model.bin or a model.safetensors file #8898

Closed christopher5106 closed 2 months ago

christopher5106 commented 2 months ago

Describe the bug

Loading pretrained pipeline component such as UNET from subfolder with path does not work.

Most of your examples use the subfolder formulation, for example here or here and it is supposed to work with a path.

Reproduction

Save "stabilityai/stable-diffusion-xl-base-1.0" locally on the computer in a "pretrained" folder for example and run

unet = UNet2DConditionModel.from_pretrained(
    "pretrained", subfolder="unet",
)

raises an error ValueError: pretrained is not a folder containing a.index.jsonfile or a pytorch_model.bin or a model.safetensors file

while it works with the remote formulation model repo + model id for hosted models.

A work around is to use

unet = UNet2DConditionModel.from_pretrained(
    "pretrained/unet",
)

that works. If this is the only way to load component weights from directory path, that means all examples can't be run from local path and variable args.pretrained_model_name_or_path should be named args.pretrained_model_name only.

Logs

No response

System Info

Python==3.10.14 Conda env without accelerate torch==2.3.1 Diffusers installed from source at commit a785992c1d6fcb1ff66f8a0d68d09a0a81b909e8

Who can help?

@DN6 @yiyixuxu @sayakpaul

yiyixuxu commented 2 months ago

hi @christopher5106

from_pretrained works with local path as well as remote repo name

I'm not exactly sure what the problem is, but i can see that in this code, you should use subfolder="unet" instead of folder="unet"

unet = UNet2DConditionModel.from_pretrained(
    "pretrained", folder="unet",
)
sayakpaul commented 2 months ago

It should be subfolder and not folder.

christopher5106 commented 2 months ago

Ah sorry

christopher5106 commented 2 months ago

It's still not working when using "subfolder" instead of "folder". Please reproduce and re-open @sayakpaul @yiyixuxu

christopher5106 commented 2 months ago

The correct error message was:

ValueError: is not a folder containing a.index.jsonfile or a pytorch_model.bin or a model.safetensors file

christopher5106 commented 2 months ago

8659

christopher5106 commented 2 months ago

4933

christopher5106 commented 2 months ago

3694

christopher5106 commented 2 months ago

The bug about subfolder has been mentioned in many issues and always closed. Some have been able to solve it by appending the "/unet" to the path instead of using subfolder argument

sayakpaul commented 2 months ago

Really note sure how this is so confusing.

In this Colab Notebook, I show all the different possibilities I could think of when loading a UNet from some checkpoint, be it local or be it from the Hub. What am I missing here?

asomoza commented 2 months ago

FYI the issues you're linking aren't related to the one you have:

8659 Was because of the sharded checkpoint.

4933 - almost a year old issue user was trying to load a lora from a full diffusers model.

3694 - more than a year old issue user needed to load the unet separately, it didn't have anything to do with the subfolder.

All of the issues that were closed were because they were resolved.

Also, maybe you downloaded the "fp16" variant? if that's the case you have to load it like this:

unet = UNet2DConditionModel.from_pretrained("pretrained", subfolder="unet", variant="fp16")
christopher5106 commented 2 months ago

Sorry, for me this environment is too slow, and also, I'm not sure about the usage, the errors I get etc.

conda create -n testsubfolder python=3.10.12 --yes
conda activate testsubfolder
git clone https://github.com/huggingface/diffusers.git
cd diffusers && pip install .
pip install torch==2.3.1 transformers==4.42.4 accelerate==0.32.1
from diffusers import UNet2DConditionModel, DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
pipeline.save_pretrained("my-pipeline")

Then, here is what I get:

unet = UNet2DConditionModel.from_pretrained("my-pipeline/unet") 
pipeline = DiffusionPipeline.from_pretrained("my-pipeline")
del unet, pipeline

works.

unet = UNet2DConditionModel.from_pretrained("my-pipeline", subfolder="unet")  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 774, in from_pretrained
    accelerate.load_checkpoint_and_dispatch(
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/accelerate/big_modeling.py", line 608, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1698, in load_checkpoint_in_model
    raise ValueError(
ValueError: my-pipeline is not a folder containing a `.index.json` file or a pytorch_model.bin or a model.safetensors file

as well as

unet = UNet2DConditionModel.from_pretrained("my-pipeline", subfolder="unet", variant="fp16")
An error occurred while trying to fetch my-pipeline: Error no file named diffusion_pytorch_model.fp16.safetensors found in directory my-pipeline.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 713, in from_pretrained
    model_file = _get_model_file(
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/testsubfolder/lib/python3.10/site-packages/diffusers/utils/hub_utils.py", line 310, in _get_model_file
    raise EnvironmentError(
OSError: Error no file named diffusion_pytorch_model.fp16.bin found in directory my-pipeline.

I need to understand where is the difference with your setup

christopher5106 commented 2 months ago

Ah, I found a first clue, when I use "runwayml/stable-diffusion-v1-5" instead of "stabilityai/stable-diffusion-xl-base-1.0", the error is gone.

Frankly to me still not obvious what I did wrong.

christopher5106 commented 2 months ago

There is a folder unet in my-pipeline that is saved from stabilityai/stable-diffusion-xl-base-1.0 , and also a UNet2DConditionModel

christopher5106 commented 2 months ago

It should work...

asomoza commented 2 months ago

I can reproduce your issue now with that information.

This is the same as the sharded checkpoint issue you linked, since you're not using fp16 to load the model, when you save it, it gets sharded and then it fails to load it. This code reproduces the error:

from diffusers import DiffusionPipeline, UNet2DConditionModel

pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
pipeline.save_pretrained("my-pipeline")

unet = UNet2DConditionModel.from_pretrained("my-pipeline", subfolder="unet")

So the problem is not the subfolder arg but the sharded checkpoint.

Also doing some more tests I found that this also doesn't work if you have the diffusion_pytorch_model.safetensors.index.json file from before, because it doesn't clean the directory so it thinks the model it's still sharded.

import torch

from diffusers import DiffusionPipeline, UNet2DConditionModel

pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipeline.save_pretrained("my-pipeline")

unet = UNet2DConditionModel.from_pretrained("my-pipeline", subfolder="unet", torch_dtype=torch.float16)

The second code works if you clean the unet directory.

cc: @sayakpaul

sayakpaul commented 2 months ago

Yeah that is the way to do it. Asking @Wauplin for his opinions here as well. Should we attempt to clean the directory here?

yiyixuxu commented 2 months ago

@sayakpaul we need to be able to load shared checkpoint from a local path, I'm fixing it here https://github.com/huggingface/diffusers/pull/8913

Lanjiong-Li commented 1 month ago

seems like I still facing the same probelm. here is the error:

Traceback(most recent call last):
File "/data/data/llj/3D diffusion/inference.py", line 446, in <module>
main( )
File "/data/data/llj/3D_diffusion/inference.py", line 238, in main
unet = UNet2DconditionModel.from pretrained(
File "/root/anaconda3/envs/idm/lib/python3.10/site-packages/huggingface hub/utils/ validators.py", line 114, in _inner_fn
return fn(*args,**kwargs)
File "/root/anaconda3/envs/idm/lib/python3. 10/site-packages/diffusers/models/modeling utils.py", line 784, in from_pretrained
accelerate.load checkpoint and dispatch(
File "/root/anaconda3/envs/idm/lib/python3.10/site-packages/accelerate/big modeling.py", line 613, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "/root/anaconda3/envs/idm/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1779, in load_checkpoint_in_model
raise ValueError(
ValueError:
/laniiongli/ckpt/checkpoint-35360/ is not a folder containing a `.index.json` file or a pytorch model.bin or a model.safetensors file

I use accelerate==0.34.0, diffusers==0.29.2. I checked src/diffusers/utils/hub_utils.py and I think the shared ckpt problem has already fixed in diffusers==0.29.2? So I am not sure what is the problem here. Here is my code:

    unet = UNet2DConditionModel.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="unet",
        torch_dtype=torch.float16,
    )

Here is my unet folder(/laniiongli/ckpt/checkpoint-35360/unet/): 400629675

asomoza commented 1 month ago

Hi, to be able to use the PR that fixed this you'll need v0.30.

You can search for 8913 here if you want to check it.