loss (smooth semantics): NAN

pq-yang / PGDiff

[NeurIPS 2023] PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance

Other

133 stars 13 forks source link

loss (smooth semantics): NAN #9

Closed aalallalalal closed 5 months ago

aalallalalal commented 5 months ago

Hi! Thanks for your work! I get a NAN problem in inference: Preliminary: 1.This is Command : python inference_pgdiff.py --task restoration --in_dir ... --out_dir ... --guidance_scale 0.05 ; 2.I use the restorer model and diffusion model provided by your code and pretrained .pths from GoogleDrive ; 3.I have tried input CelebA-512x512 and ./testdata images. 4.No change to the code and other hyperparams. Bug: 1.The first time is loss (smooth semantics): 386.76678466796875, and then it is NAN. 2.Final output image is black. PS: 1.I have tried to replace Restorer to CodeFormer, but the result is still the same as mentioned above.

Do you have any idea about this? Thanks for your time. I'm looking forward to your reply.

aalallalalal commented 5 months ago

By the way, --task colorization log is: [t=999] [14:26:12.796621] loss (lightness): 1798684.25; [t=998] [14:26:12.855295] loss (lightness): nan; [t=997] [14:26:12.909878] loss (lightness): nan; .... NAN bug may not cause by restorer.

aalallalalal commented 5 months ago

Ohh, I fixed this bug. Using .pth downloaded from BaiduDisk is work. So, .pth in Google maybe is not right.

pq-yang commented 5 months ago

Hi @aalallalalal, thanks for your attention!

May I know which checkpoint you are referring to?

I have checked the one for the diffusion model and the one for the restorer in Google Drive, and they both work fine from my side.

aalallalalal commented 5 months ago

Thanks for your reply and your time. I tested it again. .pth for restorer is fine. Using iddpm_ffhq512_ema500000.pth download from GoogleDrive gets NAN, ones from Baidu is fine. The two files(iddpm_ffhq512_ema500000.pth) are both 624377KB. But, I test with Code: s = dist_util.load_state_dict(args.model_path, map_location="cpu") # from Baidu s1 = dist_util.load_state_dict("models/iddpm_ffhq512_ema500000.pth", map_location="cpu") # from Google breakpoint() print(s.items() & s1.items()) # empty print(s.items() ^ s1.items()) # there are a lot of outputs
If this code is right , that means two of them are not same. If you try it again and it works fine, it might be some bugs unnoticed caused by me. Thanks!