Some questions about the details

Hsveh commented 4 years ago

1. I used the same method as #5 and got a similar result (PSNR:34.95). Therefore, according to your suggestion, I added different weights to each layer in the pyramid. Now the loss function I use is as follows:

class LaplacianPyramid(nn.Module):
    def __init__(self, max_level=5):
        super(LaplacianPyramid, self).__init__()
        self.gaussian_conv = GaussianConv()
        self.max_level = max_level

    def forward(self, X):
        t_pyr = []
        current = X
        for level in range(self.max_level):
            t_guass = self.gaussian_conv(current)
            t_diff = current - t_guass
            t_pyr.append(t_diff)
            current = F.avg_pool2d(t_guass, 2)
        t_pyr.append(current)

        return t_pyr

class LaplacianLoss(nn.Module):
    def __init__(self):
        super(LaplacianLoss, self).__init__()

        self.criterion = nn.L1Loss()
        self.lap = LaplacianPyramid()

    def forward(self, x, y):
        x_lap, y_lap = self.lap(x), self.lap(y)
        weights= [1, 2, 4, 8, 16, 32]
        return sum(weights[i] * self.criterion(a, b) for i, (a, b) in enumerate(zip(x_lap, y_lap)))

But I got a worse result. Could you please tell me how to change my loss function. 2. in Table 1 (Ablation experiments to quantitatively analyze the effect of the different components of our approach) of the paper, is the difference between Ours-CtxSyn-like and Ours-1 feature level only the difference between feature extractors(the feature extractor of Ours-CtxSyn-like is ResNet-18-conv1 and the one of Ours-1 feature level is cond2d->PReLU->cond2d->PReLU)?

sniklaus commented 4 years ago

We use the Laplacian loss as defined in our "Context-aware Synthesis for Video Frame Interpolation" paper. In it, we use five pyramid levels. However, your implementation uses six levels. Furthermore, you use L1Loss without any arguments which defaults to mean reduction. However, our definition uses a sum reduction instead. Note that your mileage may vary and you could also try experimenting with different learning rates. In "The Laplacian Pyramid as a Compact Image Code" by Burt and Adelson, Gaussian pyramids are constructed using a REDUCE operator and Laplacian pyramids are constructed from Gaussian pyramids while also using an EXPAND operator. I see a REDUCE operator in your implementation through a Gaussian blur together with average 2D pooling, but I do not see an EXPAND operator.
Almost, it is also that "Ours - CtxSyn-like" is using pre-trained weights from ResNet that are not modified during training. In comparison "Ours - 1 feature level" uses a random weight initialization for the feature extractor and it is subsequently trained end-to-end. In other words, "Ours - 1 feature level" tunes the feature extractor to extract features that are useful for the synthesis network. This end-to-end training was not possible with CtxSyn since it did not employ a differentiable forward warping operator.

Hsveh commented 4 years ago

Thank you very much for your reply :)

oliverxudd commented 4 years ago

Hi, sniklaus, congrats on your insightful work!

I have implemented your work and got a similar result on Vimeo Testset with Hsveh - 34.94dB. My configs:

lr - 1e-4
train 50 epoch
Adam optimizer
batch size=8
data aug: random crop, hflip, vflip and temporal reversal
Laplacian loss with REDUCE and EXPAND operator as mentioned above. Layer weights as [1, 2, 4, 8, 16] for 5 layers. Smaller spatial size with with higer weights.
other modules like grid net and pyramid feature extractor are implemented as in softmax-splatting and CtxSyn paper.

Just serve as a reference for others who want to implement.

sniklaus commented 4 years ago

Thank you for sharing this, Oliver!

XiaoyuShi97 commented 3 years ago

Hi, sniklaus, congrats on your insightful work!

I have implemented your work and got a similar result on Vimeo Testset with Hsveh - 34.94dB. My configs:
* lr - 1e-4

* train 50 epoch

* Adam optimizer

* batch size=8

* data aug: random crop, hflip, vflip and temporal reversal

* Laplacian loss with REDUCE and EXPAND operator as mentioned above. Layer weights as [1, 2, 4, 8, 16] for 5 layers. Smaller spatial size with with higer weights.

* other modules like grid net and pyramid feature extractor are implemented as in softmax-splatting and CtxSyn paper.
Just serve as a reference for others who want to implement.

Hi oliverxudd,

Could you please share your implementation, since I believe many people are struggling reproducing it.

danier97 commented 3 years ago

Hi, sniklaus, congrats on your insightful work!

I have implemented your work and got a similar result on Vimeo Testset with Hsveh - 34.94dB. My configs:

lr - 1e-4

train 50 epoch

Adam optimizer

batch size=8

data aug: random crop, hflip, vflip and temporal reversal

Laplacian loss with REDUCE and EXPAND operator as mentioned above. Layer weights as [1, 2, 4, 8, 16] for 5 layers. Smaller spatial size with with higer weights.

other modules like grid net and pyramid feature extractor are implemented as in softmax-splatting and CtxSyn paper.

Just serve as a reference for others who want to implement.

Hi Oliver, thank you for sharing this. Could you please let me know which pre-trained PWC net you used in your implementation? And if you have used the one provided by Simon, did you apply instance norm to the input of the PWC net? Or did you train a PWC from scratch with instance norm in place as mentioned here?

Thank you for your help.

sniklaus / softmax-splatting

Some questions about the details #14