Arbitrary NAN for very low MS-SSIM comparisons

jorge-pessoa commented 6 years ago

Currently the MS-SSIM calculation might return NAN when comparing two images with very low MS-SSIM scores, breaking the training process unless when accounted for.

Easy to reproduce, and can be avoided but the root cause should be discovered for fixing

PK15946 commented 6 years ago

Hi @jorge-pessoa , thanks a lot for your code! So we have to train with other loss first then switch to ms-ssim to avoid the bug yet?

serkansulun commented 6 years ago

SSIM value can be smaller than zero, as discussed here.

Taking its power with a fractional number yields NaNs.

I avoid it by normalizing the values between 0 and 1:

ret = (ret + 1) / 2
cs = (cs + 1) /2

Let me know if it's a bad idea.

h4nwei commented 6 years ago

Hello@jorge-pessoa, thank for your excellent code! I am interested in the file of max_ssim.py. However, I encountered a question that when I replace the 'pytorch_msssim.SSIM()' to 'pytorch_msssim.MSSSIM()' , it is hard to converge and cannot get the max quality score. I also try some other optimizers and learning rates but they are all failed. Do you have any idea about it? I am looking forward getting your reply.

serkansulun commented 6 years ago

@KimiHanWei For dev branch, when I run with SSIM, it takes 78 iterations and with MSSSIM it takes 135 to reach a value of 0.95. Also, in both trials, the values are increasing nicely.

h4nwei commented 6 years ago

@ssulun16 thanks a lot, when I run with SSIM, it takes 80 iterations to reach 0.95 but with MSSSIM it traps in a value of 0.25, I just hide the codes from 120 to 121 in 'init.py', since it reports an error 'zero-dimensional tensor (at position 0) cannot be concatenated'. Have you made any change to the source code? Hopefully get your answer.

serkansulun commented 6 years ago

Which version of pytorch are you using? And are you using master or dev branch for this code? On Jun 26, 2018 4:41 AM, "hanwei" notifications@github.com wrote:

@ssulun16 https://github.com/ssulun16 thanks a lot, when I run with SSIM, it takes 80 iterations to reach 0.95 but with MSSSIM it traps in a value of 0.25, I just hide the codes from 120 to 121 in 'init.py', since it reports an error 'zero-dimensional tensor (at position 0) cannot be concatenated'. Have you made any change to the source code? Hopefully get your answer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jorge-pessoa/pytorch-msssim/issues/2#issuecomment-400149463, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFh1fmckCMaFRNQtmyusEQ0tQlB2e3Gks5uAZFUgaJpZM4TmLUF .

h4nwei commented 6 years ago

I finally realize what is wrong with my code, since I use master branch in PyTorch 0.4. I have change it to dev branch, and get the same result as your. I am just rookie in Pytorch and github. Thanks for you patience :) 来自 Outlookhttp://aka.ms/weboutlook

发件人: Serkan Sulun notifications@github.com 发送时间: 2018年6月26日 11:02 收件人: jorge-pessoa/pytorch-msssim 抄送: hanwei; Mention 主题: Re: [jorge-pessoa/pytorch-msssim] Arbitrary NAN for very low MS-SSIM comparisons (#2)

Which version of pytorch are you using? And are you using master or dev branch for this code? On Jun 26, 2018 4:41 AM, "hanwei" notifications@github.com wrote:

@ssulun16 https://github.com/ssulun16 thanks a lot, when I run with SSIM, it takes 80 iterations to reach 0.95 but with MSSSIM it traps in a value of 0.25, I just hide the codes from 120 to 121 in 'init.py', since it reports an error 'zero-dimensional tensor (at position 0) cannot be concatenated'. Have you made any change to the source code? Hopefully get your answer.

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jorge-pessoa/pytorch-msssim/issues/2#issuecomment-400149463, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFh1fmckCMaFRNQtmyusEQ0tQlB2e3Gks5uAZFUgaJpZM4TmLUF .

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jorge-pessoa/pytorch-msssim/issues/2#issuecomment-400162739, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AR022FkU5qeYr9yR8-M0ACm3xVfNP9Q0ks5uAaRegaJpZM4TmLUF.

jorge-pessoa commented 6 years ago

@PK15946 If you are using PyTorch 0.4 you can also skip any batch that yields a NAN result by checking if the results from the msssim function is nan using torch.isnan() method.

@ssulun16 Thank you for your observation, you are indeed correct; I definitely missed that point. I will investigate if the normalization you proposed doesn't change the results in a meaningful way. In this case I will integrate your suggestion in the repository (you are welcome to submit a pull request if you would like to contribute!).

I also noted a few cases where the current code in the dev branch is calculating a slightly lower MS-SSIM than expected, which I will take a look at in the following days.

serkansulun commented 6 years ago

@jorge-pessoa I'm using this in my research so I have made plenty of changes but I made a pull request (I hope, I'm a newbie in Github).

Also unrelated but, I couldn't find your email so I wanted to try my chances here. I'm interested in pursuing a PhD in Lisbon so if you send me an email (address written in my profile), I would be very happy to ask you a few questions.

SalzingerJules commented 6 years ago

Hi, @jorge-pessoa. Thanks for this code, very useful. Just some questions there : 1) was the bug in this issue corrected ? 2) can we use the SSIM programmed in the dev branch with torch 0.4 or only MSSSIM ? Thanks a lot ! :)

jorge-pessoa commented 6 years ago

Hello, @SalzingerJules you can check the Pull Request submited by @ssulun at https://github.com/jorge-pessoa/pytorch-msssim/pull/3 which solves this specific problem (note that the normalization is commented). However there are a few things I would like to change on the pull request before integrating it into the dev branch and I didn't have time yet to do so. Hopefully in the next few days I can merge everything and switch the main branch to the 0.4 version.

Both the SSIM and MS-SSIM should work properly in pytorch 0.4 @ dev branch. Cheers

jorge-pessoa commented 6 years ago

The current version on the dev branch can now receive the parameter normalize=<True|False>. This can be set to true for the first few iterations of training or during all the training, avoiding NaN results. It should be useful to train unstable models using the MS-SSIM metric. The changes are still being tested however they should fix the problem reported on this issue.

jorge-pessoa commented 6 years ago

This issue is now resolved on the master branch with the commit 63788552ac97ad5505b94a3ae2cb2b2eaf71143d , by introducing the normalization flag.

zkk0911 commented 5 years ago

@jorge-pessoa ，Setting the parameter normalize really works，Thanks

ALLinLLM commented 4 years ago

thank @jorge-pessoa for the great code, and thank @PK15946 @serkansulun for the different solution. I would like to discuss the underlying reason of the NAN problem.

when reproduce the problem, I notice the nan first appear in backward in very early iteration. There is something wrong with the autograd of ms-ssim. So I check the formula.

And there is. cs**a and ssim**a, where a is a given parameter, have the 1/cs**(a-1) format in backward. the value of cs or ssim may be very close to zero, and the backward value is +inf.

To avoid nagetive zero, or very small positive value, using normalize is ok, and switch to msssim after enough iteration is also work. For me, I use cs.clamp(min=0.00001)

jorge-pessoa / pytorch-msssim

Arbitrary NAN for very low MS-SSIM comparisons #2