liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
429 stars 33 forks source link

Why a separate UNetv2? #38

Closed ghost closed 1 year ago

ghost commented 1 year ago

I thought for SD2 Unet was the same, only difference was in the text encoder which produces 1024 channels.

ghost commented 1 year ago

It looks like you are changing the number of heads

liuliu commented 1 year ago

https://www.reddit.com/r/StableDiffusion/comments/z42yph/some_notes_on_porting_sd2_over_to_iphone_or_other/

ghost commented 1 year ago

Thanks

ghost commented 1 year ago

I was inspecting sd_v2.1_f16.ckpt ( from your website ) and i saw that the weights were not matching with the torch version at https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/v2-1_768-ema-pruned.ckpt .

which torch file did you convert it from?

ghost commented 1 year ago

I think you are using this one : https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt

liuliu commented 1 year ago

Yeah, 768-v file are labelled as sd_v2.1_768_v_f16.ckpt