layerdiffusion / LayerDiffuse

Transparent Image Layer Diffusion using Latent Transparency
Apache License 2.0
1.9k stars 21 forks source link

latent visulization result #29

Open sculmh opened 2 months ago

sculmh commented 2 months ago

Thanks for your great work. I am curious about your image processing pipeline? I downloaded the apple PNG image from the README, normalized to 0 & 1, and encoded them using the provided sd15_vae_transparent_encoder. However, when I visualized the latent representations, the results differed significantly from the visualizations presented in your paper.

sculmh commented 2 months ago

input png: apple

my latent_transparency viz: viz_latent_alpha

viz in your paper:
L1VzZXJzL2xpbWVuZ2hhby9MaWJyYXJ5L0FwcGxpY2F0aW9uIFN1cHBvcnQvaURpbmdUYWxrLzM5MTk4MDYxNTVfdjIvSW1hZ2VGaWxlcy8xNzE0OTg5MDg5MDIzX0M3MTFFMkI4LTNGNzYtNDc0NS05NjZELTZGMjZCMEM1QzQ0Mi5wbmc=

Did you perform any additional preprocessing on the PNG images during training?

ariannaliu commented 1 month ago

Hi! I met the same issue here, may I know have you solve this problem?thanks a lot!

sculmh commented 1 month ago

sorry I haven't. I just skipped the problem.

fkcptlst commented 1 month ago

ur image processing pipeline? I downloaded the apple PNG image from the README, normalized to 0 & 1, and encoded them using the provided sd15_vae_transparent_encoder. However, when I visualized the latent representations, the results differed significantly from the visualizations presented in your paper.

Hi, are you getting correct reconstruction results from the decoder? i.e. decode the encoded latent.

I'm struggling to reproduce the reconstruction result, I wonder if there's something missing.

What is the correct input format of the transparent vae? Is it alpha-RGB ranging [0, 1] or [-1, 1]?

sculmh commented 1 month ago

ur image processing pipeline? I downloaded the apple PNG image from the README, normalized to 0 & 1, and encoded them using the provided sd15_vae_transparent_encoder. However, when I visualized the latent representations, the results differed significantly from the visualizations presented in your paper.

Hi, are you getting correct reconstruction results from the decoder? i.e. decode the encoded latent.

I'm struggling to reproduce the reconstruction result, I wonder if there's something missing.

What is the correct input format of the transparent vae? Is it alpha-RGB ranging [0, 1] or [-1, 1]?

Using the provided pretrained alpha encoder and decoder, the decoding results are completely inconsistent with the input. I have tried different combinations of the alpha channel and RGB channels, with scales [-1, 1] and [0, 1].

layerdiffusion commented 4 weeks ago

see also https://github.com/layerdiffusion/sd-forge-layerdiffuse/issues/90#issuecomment-2171173708