huggingface / diffusers

ðŸĪ— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.47k stars 5.45k forks source link

why 2.55 GB ckpt becomes 3.44 GB unet #1501

Closed camenduru closed 1 year ago

camenduru commented 2 years ago

is it possible to reduce size?

patrickvonplaten commented 2 years ago

Could you please give a bit more context @camenduru ? :-)

camenduru commented 2 years ago

Hi @patrickvonplaten ok example: https://huggingface.co/prompthero/openjourney/tree/main mdjrny-v4.ckpt 2.13 GB https://huggingface.co/prompthero/openjourney/tree/main/unet diffusion_pytorch_model.bin 3.44 GB what are the benefits of splitting ckpt into unet and vae? are people downloading extra empty data?

WASasquatch commented 2 years ago

Well, simply, a ckpt file is a archive, and probably employs compression. If you ever fail to download a .pt or .ckpt file, and it is corrupted, you'll notice the error you get is that the ZIP module can't decompress it. ;)

Diffusers uses raw bin files, etc, and and thus will be larger. Honestly I don't like the system. It's slower on a IO basis loading in all the files, and also harder to manage. For example, with google drive, it will break a diffusers model by packaging only some of it up as a ZIP, and then renaming the unet BIN and downloading it separate, due to it's size I am assuming.

camenduru commented 2 years ago

thanks @WASasquatch âĪ @patrickvonplaten why we are uncompressing then sharing the data ðŸĪŠ

patrickvonplaten commented 2 years ago

@camenduru note that the difference is here simply because the mdjrny-v4.ckpt is in fp16 format and the diffusion model in fp32 format:

If you sum up all the bytes of diffusers (text encoder, unet, vae), you arrive at: 492 + 335 + 3440 = 4267 which is exacty twice as much as mdjrny-v4.ckpt .

The author there decided to upload diffusers in fp32 but one could easily convert it to fp16 on the disk.

The reason why we have a folder structure in diffusers is because it makes it easier to train certain parts of the pipeline e.g. https://github.com/huggingface/diffusers/tree/main/examples/dreambooth and to switch out certain parts of the pipeline, e.g. using: https://huggingface.co/stabilityai/sd-vae-ft-mse

camenduru commented 2 years ago

thanks @patrickvonplaten âĪ if we convert fp16 to fp32 we are just adding zeros right? this means people downloading extra empty data? no benefit we are just doubling the size right?

patrickvonplaten commented 2 years ago

It depends on how the model was trained, if it was trained in fp32 then we're removing precision when converting to fp16 - it's really up to the author here!

Feel free to open an issue directly on the model repo: https://huggingface.co/prompthero/openjourney/

WASasquatch commented 2 years ago

removing precision when converting to fp16

Has this actually been verified for 8bit image output? I've compared the difference between hundreds of Full Float and Half Float seeds to have no difference at all. Solid black image with one tone. Which means there is no difference, besides the storage and resources to obtain the image.

Could it be full-float/half-float is assuming a totally different usage to actually benefit that's outside of art image synthesis?

camenduru commented 2 years ago

thanks @patrickvonplaten âĪ not just one repo lots of repos same I think people downloaded zettabytes (dramatic effect) of empty data from ðŸĪ— can we add safety check like if model fp16 only convert to fp16 diffusers in convert_original_stable_diffusion_to_diffusers.py

camenduru commented 2 years ago

hehe Screenshot 2022-12-04 050513

WASasquatch commented 2 years ago

hehe Screenshot 2022-12-04 050513

If the dtype is available from the model itself through named_parameters(); shouldn't this already be used to set the correct dtype for conversion, fine-tuning or whatever? Surprised that isn't being utilized. Unless it isn't always true to the model?

patrickvonplaten commented 1 year ago

thanks @patrickvonplaten heart not just one repo lots of repos same I think people downloaded zettabytes (dramatic effect) of empty data from hugs can we add safety check like if model fp16 only convert to fp16 diffusers in convert_original_stable_diffusion_to_diffusers.py

That's a very good idea to add a check there! I agree that we should try to avoid to upload weights that are just bloated, but identical to fp16 -> would you maybe like to open such a PR? :-)

camenduru commented 1 year ago

@patrickvonplaten when I make this stable enough I will 😄

WASasquatch commented 1 year ago

@patrickvonplaten when I make this stable enough I will 😄

Oh hey, that's cool, Automatics on HF!?

May I suggest maybe removing the PNGInfo stuff? It's not what PNG Tags are for, really, and I feel it's dangerous. I've already had to warn people that they were selling something that I could just remake cause their preview image brings up their wildcard prompt so I can make as many variations of it I want, without buying it lol and a lot of image editors respect metadata and pass it along.

camenduru commented 1 year ago

@WASasquatch yes auto111 🎉 I just changed anime model to sd v2.1 community got angry ðŸ˜Ą If I remove PNGInfo community like peepo-riot-peepo

WASasquatch commented 1 year ago

@WASasquatch yes auto111 🎉 I just changed anime model to sd v2.1 community got angry ðŸ˜Ą If I remove PNGInfo community like peepo-riot-peepo peepo-riot-peepo

I'd say protecting less knowledgeful people is pretty important. I'd at least make it a opt-into sort of thing. having that sort of automatic functionality on HF, with it's privacy policies and stuff, doesn't seem to go hand-in-hand.

Maybe if anything, explain it clearly, that a user would need to strip this information for commercial aspects.

camenduru commented 1 year ago

Software engineers can not decide what is right or what is wrong for users, congress creates and passes bills. The president then may sign those bills into law. Federal courts may review the laws to see if they agree with the constitution. 😄 I think freedom is the most important thing, in an example if we look at nsfw filter I think who we are to protect users we are not lawmakers and software companies are not lawmakers otherwise we all become dictators 😋

I opened this issues because people downloading GBs of empty data I think "this is the way" if you want to call it protecting users

spoiler hehe https://user-images.githubusercontent.com/54370274/206836154-f6f3d513-be33-457f-a0a1-b3403900dcc3.mp4
camenduru commented 1 year ago

@WASasquatch also I am agree with adding PNGInfo on off option

camenduru commented 1 year ago

@WASasquatch I found, already there is an option ðŸĪĢ if you want disable the PNGInfo you can edit config.json like this "enable_pnginfo": false,