Closed Hsveh closed 4 years ago
We use the Laplacian loss as defined in our "Context-aware Synthesis for Video Frame Interpolation" paper. In it, we use five pyramid levels. However, your implementation uses six levels. Furthermore, you use L1Loss without any arguments which defaults to mean reduction. However, our definition uses a sum reduction instead. Note that your mileage may vary and you could also try experimenting with different learning rates. In "The Laplacian Pyramid as a Compact Image Code" by Burt and Adelson, Gaussian pyramids are constructed using a REDUCE operator and Laplacian pyramids are constructed from Gaussian pyramids while also using an EXPAND operator. I see a REDUCE operator in your implementation through a Gaussian blur together with average 2D pooling, but I do not see an EXPAND operator.
Almost, it is also that "Ours - CtxSyn-like" is using pre-trained weights from ResNet that are not modified during training. In comparison "Ours - 1 feature level" uses a random weight initialization for the feature extractor and it is subsequently trained end-to-end. In other words, "Ours - 1 feature level" tunes the feature extractor to extract features that are useful for the synthesis network. This end-to-end training was not possible with CtxSyn since it did not employ a differentiable forward warping operator.
Thank you very much for your reply :)
Hi, sniklaus, congrats on your insightful work!
I have implemented your work and got a similar result on Vimeo Testset with Hsveh - 34.94dB. My configs:
Just serve as a reference for others who want to implement.
Thank you for sharing this, Oliver!
Hi, sniklaus, congrats on your insightful work!
I have implemented your work and got a similar result on Vimeo Testset with Hsveh - 34.94dB. My configs:
* lr - 1e-4 * train 50 epoch * Adam optimizer * batch size=8 * data aug: random crop, hflip, vflip and temporal reversal * Laplacian loss with REDUCE and EXPAND operator as mentioned above. Layer weights as [1, 2, 4, 8, 16] for 5 layers. Smaller spatial size with with higer weights. * other modules like grid net and pyramid feature extractor are implemented as in softmax-splatting and CtxSyn paper.
Just serve as a reference for others who want to implement.
Hi oliverxudd,
Could you please share your implementation, since I believe many people are struggling reproducing it.
Hi, sniklaus, congrats on your insightful work!
I have implemented your work and got a similar result on Vimeo Testset with Hsveh - 34.94dB. My configs:
- lr - 1e-4
- train 50 epoch
- Adam optimizer
- batch size=8
- data aug: random crop, hflip, vflip and temporal reversal
- Laplacian loss with REDUCE and EXPAND operator as mentioned above. Layer weights as [1, 2, 4, 8, 16] for 5 layers. Smaller spatial size with with higer weights.
- other modules like grid net and pyramid feature extractor are implemented as in softmax-splatting and CtxSyn paper.
Just serve as a reference for others who want to implement.
Hi Oliver, thank you for sharing this. Could you please let me know which pre-trained PWC net you used in your implementation? And if you have used the one provided by Simon, did you apply instance norm to the input of the PWC net? Or did you train a PWC from scratch with instance norm in place as mentioned here?
Thank you for your help.
1. I used the same method as #5 and got a similar result (PSNR:34.95). Therefore, according to your suggestion, I added different weights to each layer in the pyramid. Now the loss function I use is as follows:
But I got a worse result. Could you please tell me how to change my loss function. 2. in Table 1 (Ablation experiments to quantitatively analyze the effect of the different components of our approach) of the paper, is the difference between Ours-CtxSyn-like and Ours-1 feature level only the difference between feature extractors(the feature extractor of Ours-CtxSyn-like is ResNet-18-conv1 and the one of Ours-1 feature level is cond2d->PReLU->cond2d->PReLU)?