showlab / RingID

17 stars 0 forks source link

Some questions about black images #4

Open Ashlars opened 1 month ago

Ashlars commented 1 month ago

Hello,

I'm really excited about this interesting work, but I encountered some issues while trying to reproduce it and I hope to get your help.

I ported part of the code from the work, but I noticed that when generating images of size 1024*1024, the diffusion model produces completely black images (i.e., all pixel values are 0).

I would like to ask if you have encountered such a problem during your experiments and if possible, I hope you can help me troubleshoot the issue.

I checked the no_watermark_latents, and the images generated from it appear normal. The values in Fourier_watermark_latents look normal as well, but the images generated from it are completely black.

Additionally, when generating 512*512 images, the entire generation and validation process works fine without any issues.

I look forward to hearing from you and would really appreciate your assistance. Thank you very much.

Ashlars commented 1 month ago

Sorry, I found inf value in the Fourier_watermark_latents, which may be the cause of the situation. I think I have some parameters that are not set correctly and would like to ask if I need to adjust those parameters to generate 1024*1024 pictures. Look forward to your reply.

Ashlars commented 1 month ago

Just add

init_latents[init_latents == float("Inf")] = 4
init_latents[init_latents == float("-Inf")] = -4

in the same way as Treering.

Embracing commented 1 month ago

Sorry, I found inf value in the Fourier_watermark_latents, which may be the cause of the situation. I think I have some parameters that are not set correctly and would like to ask if I need to adjust those parameters to generate 1024*1024 pictures. Look forward to your reply.

Yes. Some parameters must be adjusted to generate a 1024 x 1024 picture (suppose its latent size is 128). Currently, the size of initial latent is hard coded to 64. Maybe you can modify utils.py to change all 64 to 128, and 65 to 129 to generate watermarked 128 x 128 latent.

We are working on the code to make it adaptive to arbitrary latent size.