Closed mrchizx closed 3 years ago
Our evaluation and sampling pattern is different, and explained in detail in our paper.
For 8x interpolation, we sample 25 consecutive frame from the video (F1 - F25). Inputs are (F1,F9,F17,F25) and ground truth intermediate frames are (F10-F16). In our GoPro dataset, this amounts to converting 30FPS videos to 240FPS.
Similarly, for 4x interpolation, we sample 13 consecutive frame from any video (F1 - F13). Inputs are (F1,F5,F9,F13) and ground truth intermediate frames are (F6-F8). In our GoPro dataset, this amounts to 60FPS - 240FPS. Clearly, (60->240) is easier than (30->240), and hence the higher PSNR.
Oh, I thought 4x is interpolating from 30fps to 120fps.
In the case of 60fps->240fps, it's easier.
Thanks for answering.
Hi,
I have a question about the evaluation on the 8x and 4x cases for Table.2 and Table. 3 regarding the Adobe dataset in the paper. It seems 4x cases has much higher PSNR compared to 8x cases.
Let's say the 7 intermediate frames are denoted as t1, t2, t3, t4, t5, t6, t7. To my understanding the PSNR values are normally: (t1 close to t7) > (t2 close to t6) > (t3 close to t5) > t4 At lease, this is what I have observed for DAIN, SuperSloMo and QVI. And it is expected that when the temporal distance to the input frame increases, the interpolated quality decreases (lower PSNR).
For 4x, you would only have t2, t4, t6, so the average PSNR values should be expected to be lower than 8x.
However, for 4x in Table.3 FLAVR is 5.62dB higher compare to 8x in Table.2. And other methods (DAIN, QVI and SuperSloMo) all experienced much higher PSNR. To my understanding 5.62dB is a huge increase.
The expected trend should be similar to Table.3 in BMBC paper: https://arxiv.org/pdf/2007.12622.pdf where PSNR(2x) < PSNR(4x) < PSNR(8x).
I am wondering if there is anything I missed for the evaluation that causes my confusion?
Thanks