Open emreaniloguz opened 1 year ago
Hi @emreaniloguz, thanks for reporting the issue. Can you provide the url of the repo of the finetuned model on the Hub please? I would like to investigate it myself. If you can't make the model public for privacy reason, would it be possible to create an org, add the model to this org (as private) and add my account to the org so that I can have access to it. Also for completeness, can you paste the full code you use to instantiate the model? Thank you in advance.
Hi @Wauplin, I didn't exactly understand what you mean by "Can you provide the url of the repo of the finetuned model on the Hub please". If I understand correctly, you want me to share my final "fine-tuned" model, but there isn't because of the error. You can access the pre-trained model hub URL from here. Please correct me if I'm missing something.
Oh ok, I misunderstood the original issue then. So basically you try to download weights from https://huggingface.co/CompVis/stable-diffusion-v1-4 and you get this error ? Just to be sure, could you:
huggingface-cli delete-cache
and select "Model CompVis/stable-diffusion-v1-4"
. For a better CLI UI, it's best to install huggingface_hub[cli]
first.pip install huggingface_hub==0.16.4
. We released last week a fix in the HTTP session we use. I doubt it will fix your issue but it's worth trying.I'm sorry in advance if you have a limited connection but this should cross-out some possible reasons for your bug and I'd like to try it before investigating further.
Wow actually the issue is very intriguing :exploding_head: It seems that for some reason the safety_checker/model.safetensors
and the text_encoder/model.safetensors
files have been mixed.
Here are the actual sizes of the files on S3:
➜ ~ curl --head https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/safety_checker/model.safetensors | grep size
x-linked-size: 1215981830
➜ ~ curl --head https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/text_encoder/model.safetensors | grep size
x-linked-size: 492265879
Given the error message you got (OSError: Consistency check failed: file should be of size 1215981833 but has size 492265879 (model.safetensors).
), this cannot be a coincidence.
Oh ok, I misunderstood the original issue then. So basically you try to download weights from https://huggingface.co/CompVis/stable-diffusion-v1-4 and you get this error ? Just to be sure, could you:
1. Delete the cached repo: run `huggingface-cli delete-cache` and select `"Model CompVis/stable-diffusion-v1-4"`. For a better CLI UI, it's best to install `huggingface_hub[cli]` first. 2. Upgrade deps to `pip install huggingface_hub==0.16.4`. We released last week a fix in the HTTP session we use. I doubt it will fix your issue but it's worth trying. 3. Retry the download.
I'm sorry in advance if you have a limited connection but this should cross-out some possible reasons for your bug and I'd like to try it before investigating further.
I've done everything that you mentioned and started fine-tuning but the result is the same, OSError.
Wow actually the issue is very intriguing exploding_head It seems that for some reason the
safety_checker/model.safetensors
and thetext_encoder/model.safetensors
files have been mixed.Here are the actual sizes of the files on S3:
➜ ~ curl --head https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/safety_checker/model.safetensors | grep size x-linked-size: 1215981830 ➜ ~ curl --head https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/text_encoder/model.safetensors | grep size x-linked-size: 492265879
Given the error message you got (
OSError: Consistency check failed: file should be of size 1215981833 but has size 492265879 (model.safetensors).
), this cannot be a coincidence.
This is interesting :)
I've done everything that you mentioned and started fine-tuning but the result is the same, OSError.
Ok thanks for confirming. That's so weird :grimacing: I'll try to reproduce myself and let you know.
Just to be sure, what happens if you delete your cache and run
from diffusers import StableDiffusionImg2ImgPipeline
model = StableDiffusionImg2ImgPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
?
Just to be sure, what happens if you delete your cache and run
from diffusers import StableDiffusionImg2ImgPipeline model = StableDiffusionImg2ImgPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
?
Here is my output:
[2023-07-10 14:01:43,910] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Downloading (…)ain/model_index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 541/541 [00:00<00:00, 38.6kB/s]
Downloading (…)69ce/vae/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 551/551 [00:00<00:00, 127kB/s]
Downloading model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.22G/1.22G [00:13<00:00, 88.1MB/s]
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:14<00:00, 1.09it/s]
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overridden.
I think it's the correct model.safetensors, right?
Hmmm, so no errors at all when using the one from diffusers
... But I wouldn't say it is because of DreamPose implementation either since the failing part is really an internal consistency check within huggingface_hub
:thinking:
(Though now that you successfully cached the repo locally, you should be able to continue with your training. It's not fixing the actual issue but at least unblock you, right?)
Hmmm, so no errors at all when using the one from
diffusers
... But I wouldn't say it is because of DreamPose implementation either since the failing part is really an internal consistency check withinhuggingface_hub
thinking(Though now that you successfully cached the repo locally, you should be able to continue with your training. It's not fixing the actual issue but at least unblock you, right?)
I'll share the result in 5 min.
The error is the same, but I think it should be related to the force_download parameter that I've hardcoded into the huggingface_hub library. The code tries to download text_encoder safe.tensors file. I'll change the library to its default version and give it a try. I'll also write here if it'll work.
I've first run this script where the safetensors are okay. Then I upgraded the huggingface-hub to the default structure where the force_download parameter is unchanged. Alas, the error remains.
Just to be sure, what happens if you delete your cache and run
from diffusers import StableDiffusionImg2ImgPipeline model = StableDiffusionImg2ImgPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
?
Here is my output:
[2023-07-10 14:01:43,910] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Downloading (…)ain/model_index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 541/541 [00:00<00:00, 38.6kB/s] Downloading (…)69ce/vae/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 551/551 [00:00<00:00, 127kB/s] Downloading model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.22G/1.22G [00:13<00:00, 88.1MB/s] Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:14<00:00, 1.09it/s] `text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overridden.
I think it's the correct model.safetensors, right?
@emreaniloguz Just to be sure, the error now is 'text_config_dict' is provided which will be used to initialize 'CLIPTextConfig'. The value 'text_config["id2label"]' will be overridden.
, right? So not related to the initial consistency check failure? If that's the case, it'd be best to open an issue on diffusers or DreamPose repository to get some more help.
Btw, our conversation made me realize that force_download
was not correctly taken into account in diffusers, hence the hardcoded value that you needed to set. I've made a PR (https://github.com/huggingface/diffusers/pull/4036) so it should be fixed in next release or if you install from git source.
To update the issue, I've deleted the "revision" argument from everywhere and could overcome the problem but the results were not expected as I would. Someone else could try somewhere else also.
Describe the bug
I'm trying to run DreamPose Repository. When I finished fine-tuning the UNet, the code saved the fine-tuned network with this code snippet
It failed due to: OSError: Consistency check failed: file should be of size 1215981833 but has size 492265879 (model.safetensors). (You can find the full output In the Logs section.)
Reproduction
No response
Logs
System info