Questions about Deep BurstSR paper and NTIRE`21 challenge

Magauiya commented 3 years ago

Dear organizers and authors,

I would appreciate if you clarify following questions:

About paper:

Why do you experience huge PSNR gap between BurstSR and Synthetic Datasets in Table 1? Ex: Burst-14 39.09dB and 47.76dB
Does the gap mean that the distribution of synthetically generated images is far from the distribution of real burst images?
Do you have any theory/explanation about Burst-2 results that are worse than Single Image?
Have you converted network outputs to sRGB when you estimated MOR in Table 5?
What was the input patch size when you trained the network on BurstSR dataset? Was it 40x40x1?

About Challenge:

When are you planning to release results and methods used in the challenge?
Have you used the same testset as in Deep BurstSR paper for Track 2?

Thank you!

goutamgmb commented 3 years ago

1a. I believe the PSNR gap is due to two factors. First, the images in BurstSR have smaller intensity values in general, compared to Synthetic Dataset. This leads to higher PSNR values. Secondly, due to spatial and color mis-alignment issue, we align the network prediction to the ground truth when computing PSNR on the BurstSR dataset. This may also lead to increased PSNR value.

b. We have seen that directly applying network trained on only synthetic data to images from BurstSR leads to poor results which indicates that there is indeed a significant distribution shift.

c. The model was trained to operate on large bursts (burst size 8). Thus when using very small bursts (burst-2), its performance can be poor since the benefits of merging information from multiple frames can be outweighed by the harms caused by e.g. small mis-alignments. If we train a network using smaller bursts (burst size 2), then we observed that it obtains similar or slightly better results compared to the single image baseline, which using burst with 2 images.

d. Yes, for the user study we converted images to sRGB.

e. No, the network was trained using 56x56x4 (last channel contains RGGB values) patches.

2a. We will post the challenge report on arxiv next week.

b. Yes, we used the same test set.

Regards Goutam

Magauiya commented 3 years ago

1a. I believe the PSNR gap is due to two factors. First, the images in BurstSR have smaller intensity values in general, compared to Synthetic Dataset. This leads to higher PSNR values. Secondly, due to spatial and color mis-alignment issue, we align the network prediction to the ground truth when computing PSNR on the BurstSR dataset. This may also lead to increased PSNR value.

b. We have seen that directly applying network trained on only synthetic data to images from BurstSR leads to poor results which indicates that there is indeed a significant distribution shift.

c. The model was trained to operate on large bursts (burst size 8). Thus when using very small bursts (burst-2), its performance can be poor since the benefits of merging information from multiple frames can be outweighed by the harms caused by e.g. small mis-alignments. If we train a network using smaller bursts (burst size 2), then we observed that it obtains similar or slightly better results compared to the single image baseline, which using burst with 2 images.

d. Yes, for the user study we converted images to sRGB.

e. No, the network was trained using 56x56x4 (last channel contains RGGB values) patches.

2a. We will post the challenge report on arxiv next week.

b. Yes, we used the same test set.

Regards Goutam

Thank you for the detailed answers!

I am a bit curious about the network complexity (# of parameters, macs), because your proposed model with single image input works an order of magnitude better than our model on 14 frames for Track 2.

goutamgmb commented 3 years ago

You're welcome!

That's strange. Including the parameters of our alignment network (PWCNet), our model had 13011237 parameters. Excluding the alignment network, the number of parameters is 3636963. Detailed network architecture, including number of residual blocks, channel dimensions etc is provided in our supplementary.

How is the performance on the synthetic set? I would suggest first ensuring that your model works well on the synthetic dataset.

Regards Goutam

Magauiya commented 3 years ago

You're welcome!

That's strange. Including the parameters of our alignment network (PWCNet), our model had 13011237 parameters. Excluding the alignment network, the number of parameters is 3636963. Detailed network architecture, including number of residual blocks, channel dimensions etc is provided in our supplementary.

How is the performance on the synthetic set? I would suggest first ensuring that your model works well on the synthetic dataset.

Regards Goutam

The PSNR on synthetic validation dataset is ~43dB and on test is ~46-47dB. We have used 14 frames. The model is huge.

Magauiya commented 3 years ago

Do you have plans to release your implementation?

Sincerely, Magauiya

goutamgmb / NTIRE21_BURSTSR

Questions about Deep BurstSR paper and NTIRE`21 challenge #23