The different results in "Deep Burst SR" v1 and v2

tonyzzzt commented 3 years ago

Hi, I wonder why the results of the paper "Deep Burst SR" of version2 is much higher than that in version1 on arxiv? Is there any changes in method? metrics? or training details? But it seems the same in visual results. Thanks!

goutamgmb commented 3 years ago

Hi,

Sorry for the confusion. We use the same model trained in version1 to obtain the scores in version2. As mentioned here, in version1, we computed the metrics after applying the white-balancing, and scaling of intensities (in case of burstsr dataset). In order to be consistent with the evaluation methodology employed in the burst SR challenge, we have updated the metrics in Deep Burst SR version2 with the scores computed in the sensor space, before white-balancing. This results in higher scores compared to version1.

tonyzzzt commented 3 years ago

But it is confusd that why you compute the PSNR after applying the white-balancing on the trained outputs in verson1? Because normally, as the code you provided, the trained model outputs is non-white-balancing and it is straightforward to compute this output with the gt. And I wonder the gt of val data in Track1 is before or after applying the white-balancing? Why the results before white-balancing can be improved such a lot than that after white-balancing?

Askiry commented 3 years ago

I am also very confused about this issue. Why the difference is so large between the results before and after white-balancing? In the former Track 1, should we submit the results before applying white-balancing, or after it? Can you provided the accurate codes before computing the PSNR?

goutamgmb commented 3 years ago

@tonyzzzt 1) The GT of val/test data in Track1 is before applying white-balance. 2) In version1, we computed the results after white-balancing as that can give a more perceptually relevant metric. However for the burstSR challenge, we decided that it is best to compute the results directly in raw sensor space, before white-balancing. To be consistent with the challenge protocol, we have updated the results in Deep Burst SR arxiv. 3) Note that when applying white-balancing, the red and blue channels are multiplied by a scalar (approx between 1.6 - 2.4). As a results, the L2 error between the prediction and the ground truth will be scaled up, leading to a smaller PSNR. Hence the difference in PSNR values in version1 and version2. However, note that this increase in PSNR is just due to the scaling of input data, and DOES NOT correspond to any actual improvement in image quality.

@Askiry Please check the above reply to tonyzzzt for an explanation on why the psnr is different. In track 1, you should submit results BEFORE applying white-balancing. The psnr is computation in the challenge Track1, as well as version2 of Deep Burst SR is shown here.

Hope this answers the questions.

Regards Goutam

goutamgmb / NTIRE21_BURSTSR

The different results in "Deep Burst SR" v1 and v2 #21