Closed jorge-pessoa closed 6 years ago
Hi @jorge-pessoa , thanks a lot for your code! So we have to train with other loss first then switch to ms-ssim to avoid the bug yet?
SSIM value can be smaller than zero, as discussed here.
Taking its power with a fractional number yields NaNs.
I avoid it by normalizing the values between 0 and 1:
ret = (ret + 1) / 2
cs = (cs + 1) /2
Let me know if it's a bad idea.
Hello@jorge-pessoa, thank for your excellent code! I am interested in the file of max_ssim.py. However, I encountered a question that when I replace the 'pytorch_msssim.SSIM()' to 'pytorch_msssim.MSSSIM()' , it is hard to converge and cannot get the max quality score. I also try some other optimizers and learning rates but they are all failed. Do you have any idea about it? I am looking forward getting your reply.
@KimiHanWei For dev branch, when I run with SSIM, it takes 78 iterations and with MSSSIM it takes 135 to reach a value of 0.95. Also, in both trials, the values are increasing nicely.
@ssulun16 thanks a lot, when I run with SSIM, it takes 80 iterations to reach 0.95 but with MSSSIM it traps in a value of 0.25, I just hide the codes from 120 to 121 in 'init.py', since it reports an error 'zero-dimensional tensor (at position 0) cannot be concatenated'. Have you made any change to the source code? Hopefully get your answer.
Which version of pytorch are you using? And are you using master or dev branch for this code? On Jun 26, 2018 4:41 AM, "hanwei" notifications@github.com wrote:
@ssulun16 https://github.com/ssulun16 thanks a lot, when I run with SSIM, it takes 80 iterations to reach 0.95 but with MSSSIM it traps in a value of 0.25, I just hide the codes from 120 to 121 in 'init.py', since it reports an error 'zero-dimensional tensor (at position 0) cannot be concatenated'. Have you made any change to the source code? Hopefully get your answer.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jorge-pessoa/pytorch-msssim/issues/2#issuecomment-400149463, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFh1fmckCMaFRNQtmyusEQ0tQlB2e3Gks5uAZFUgaJpZM4TmLUF .
I finally realize what is wrong with my code, since I use master branch in PyTorch 0.4. I have change it to dev branch, and get the same result as your. I am just rookie in Pytorch and github. Thanks for you patience :) 来自 Outlookhttp://aka.ms/weboutlook
发件人: Serkan Sulun notifications@github.com 发送时间: 2018年6月26日 11:02 收件人: jorge-pessoa/pytorch-msssim 抄送: hanwei; Mention 主题: Re: [jorge-pessoa/pytorch-msssim] Arbitrary NAN for very low MS-SSIM comparisons (#2)
Which version of pytorch are you using? And are you using master or dev branch for this code? On Jun 26, 2018 4:41 AM, "hanwei" notifications@github.com wrote:
@ssulun16 https://github.com/ssulun16 thanks a lot, when I run with SSIM, it takes 80 iterations to reach 0.95 but with MSSSIM it traps in a value of 0.25, I just hide the codes from 120 to 121 in 'init.py', since it reports an error 'zero-dimensional tensor (at position 0) cannot be concatenated'. Have you made any change to the source code? Hopefully get your answer.
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jorge-pessoa/pytorch-msssim/issues/2#issuecomment-400149463, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFh1fmckCMaFRNQtmyusEQ0tQlB2e3Gks5uAZFUgaJpZM4TmLUF .
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jorge-pessoa/pytorch-msssim/issues/2#issuecomment-400162739, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AR022FkU5qeYr9yR8-M0ACm3xVfNP9Q0ks5uAaRegaJpZM4TmLUF.
@PK15946 If you are using PyTorch 0.4 you can also skip any batch that yields a NAN result by checking if the results from the msssim function is nan using torch.isnan() method.
@ssulun16 Thank you for your observation, you are indeed correct; I definitely missed that point. I will investigate if the normalization you proposed doesn't change the results in a meaningful way. In this case I will integrate your suggestion in the repository (you are welcome to submit a pull request if you would like to contribute!).
I also noted a few cases where the current code in the dev branch is calculating a slightly lower MS-SSIM than expected, which I will take a look at in the following days.
@jorge-pessoa I'm using this in my research so I have made plenty of changes but I made a pull request (I hope, I'm a newbie in Github).
Also unrelated but, I couldn't find your email so I wanted to try my chances here. I'm interested in pursuing a PhD in Lisbon so if you send me an email (address written in my profile), I would be very happy to ask you a few questions.
Hi, @jorge-pessoa. Thanks for this code, very useful. Just some questions there : 1) was the bug in this issue corrected ? 2) can we use the SSIM programmed in the dev branch with torch 0.4 or only MSSSIM ? Thanks a lot ! :)
Hello, @SalzingerJules you can check the Pull Request submited by @ssulun at https://github.com/jorge-pessoa/pytorch-msssim/pull/3 which solves this specific problem (note that the normalization is commented). However there are a few things I would like to change on the pull request before integrating it into the dev branch and I didn't have time yet to do so. Hopefully in the next few days I can merge everything and switch the main branch to the 0.4 version.
Both the SSIM and MS-SSIM should work properly in pytorch 0.4 @ dev branch. Cheers
The current version on the dev branch can now receive the parameter normalize=<True|False>. This can be set to true for the first few iterations of training or during all the training, avoiding NaN results. It should be useful to train unstable models using the MS-SSIM metric. The changes are still being tested however they should fix the problem reported on this issue.
This issue is now resolved on the master branch with the commit 63788552ac97ad5505b94a3ae2cb2b2eaf71143d , by introducing the normalization flag.
@jorge-pessoa ,Setting the parameter normalize really works,Thanks
thank @jorge-pessoa for the great code, and thank @PK15946 @serkansulun for the different solution. I would like to discuss the underlying reason of the NAN problem.
when reproduce the problem, I notice the nan first appear in backward in very early iteration. There is something wrong with the autograd of ms-ssim. So I check the formula.
And there is. cs**a
and ssim**a
, where a is a given parameter, have the 1/cs**(a-1)
format in backward. the value of cs or ssim may be very close to zero, and the backward value is +inf.
To avoid nagetive zero, or very small positive value, using normalize is ok, and switch to msssim after enough iteration is also work. For me, I use cs.clamp(min=0.00001)
Currently the MS-SSIM calculation might return NAN when comparing two images with very low MS-SSIM scores, breaking the training process unless when accounted for.
Easy to reproduce, and can be avoided but the root cause should be discovered for fixing