PSNR metric for masked images

Ruyi-Zha commented 1 year ago

Hi,

Thanks for your great work. I have a question about evaluation metrics for masked images.

As we know, PSNR is related to MSE, which is the mean of squared difference $mse=sum((gt-pred)^2)/N$.

In eval_rgb .py lines 198-199, we mask images with imgs = torch.Tensor(imgs).to(device) * masks and then compute PSNR. In this way, pixels out of the mask are set as 0. While they do not contribute to the sum of the squared difference $sum((gt-pred)**2)$, they actually contribute to $N$, which causes lower MSE, and hence higher PSNR.

I notice that the provided masks filter out around 30% area of images. Therefore such a masking method leads to a -30% MSE decrease, and a +2dB PSNR increase.

I suppose a more reasonable way of masking is to directly remove pixels with imgs = imgs[masks]. In this way, pixels out of the mask neither contribute to mse=sum((gt-pred)^2) nor N.

Please correct me if I am wrong. Thanks.

yuehaowang commented 1 year ago

Thanks for raising this good question. I fully agree with you that only evaluating unmasked pixels is more reasonable.

In my implementation, I also evaluated those black pixels in order to keep a coherent manner to computing other metrics, like SSIM. Since SSIM is calculated on image windows, it is inconvenient to left out those masked areas. Thus, I simply take the masked images for evaluation in all metrics.

I think both your approach and mine could work to evaluate the rendering quality as if all comparisons are evaluated in the same way. Nevertheless, I may prefer your approach in the future work as it provides more accurate evaluation in PSNR.

Any further discussion is welcome.

Ruyi-Zha commented 1 year ago

Thanks for your prompt reply.

Looks like magick supports SSIM with masked images. Can't find a masked version for LPIPS though...Hope this can help you.

smoreira00 commented 1 year ago

Thanks for raising this good question. I fully agree with you that only evaluating unmasked pixels is more reasonable.

In my implementation, I also evaluated those black pixels in order to keep a coherent manner to computing other metrics, like SSIM. Since SSIM is calculated on image windows, it is inconvenient to left out those masked areas. Thus, I simply take the masked images for evaluation in all metrics.

I think both your approach and mine could work to evaluate the rendering quality as if all comparisons are evaluated in the same way. Nevertheless, I may prefer your approach in the future work as it provides more accurate evaluation in PSNR.

Any further discussion is welcome.

So you're saying that by doing this:

masks = torch.Tensor(1.0 - masks).to(device).unsqueeze(-1)
gts = torch.Tensor(gts).to(device) * masks
imgs = torch.Tensor(imgs).to(device) * masks

mse = img2mse(imgs, gts)
psnr = mse2psnr(mse)
ssim_ = ssim(imgs, gts, format='NHWC')
lpips_ = lpips(imgs, gts, format='NHWC')

you're evaluating the black pixels?

According to my good results, my PSNR doesn't seem to be right and I'm just using eval_rgb.py...

yuehaowang commented 1 year ago

Thanks for raising this good question. I fully agree with you that only evaluating unmasked pixels is more reasonable. In my implementation, I also evaluated those black pixels in order to keep a coherent manner to computing other metrics, like SSIM. Since SSIM is calculated on image windows, it is inconvenient to left out those masked areas. Thus, I simply take the masked images for evaluation in all metrics. I think both your approach and mine could work to evaluate the rendering quality as if all comparisons are evaluated in the same way. Nevertheless, I may prefer your approach in the future work as it provides more accurate evaluation in PSNR. Any further discussion is welcome.

So you're saying that by doing this:
masks = torch.Tensor(1.0 - masks).to(device).unsqueeze(-1)
gts = torch.Tensor(gts).to(device) * masks
imgs = torch.Tensor(imgs).to(device) * masks

mse = img2mse(imgs, gts)
psnr = mse2psnr(mse)
ssim_ = ssim(imgs, gts, format='NHWC')
lpips_ = lpips(imgs, gts, format='NHWC')
you're evaluating the black pixels?

According to my good results, my PSNR doesn't seem to be right and I'm just using eval_rgb.py...

Yes, in my experiments, black pixels are also evaluated. However, I believe this evaluation scheme could be improved. If there are too many black pixels, the PSNR could fail to measure the true performance.

med-air / EndoNeRF

PSNR metric for masked images #11