Feature request: Add ensembling

ltkong218 / IFRNet

IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation (CVPR 2022)

MIT License

259 stars 23 forks source link

Feature request: Add ensembling #4

Open styler00dollar opened 2 years ago

styler00dollar commented 2 years ago

Ensembling is used in frame interpolation to drastically improve visual quality by using different predictions and generating a mean. Here is a paper talking about ensembling.

Rife does use it as well and uses 2 predictions, which can be seen here. It should not be very hard to add. I would trade off some speed to have more quality. Thanks.

ltkong218 commented 2 years ago

According to your needs, I think you can add the following ensemble inference function to IFRNet.py, IFRNet_L.py and IFRNet_S.py

def inference_ensemble(self, img0, img1, embt, scale_factor=1.0):
    imgt_pred_1 = self.inference(img0, img1, embt, scale_factor)
    imgt_pred_2 = self.inference(img1, img0, 1-embt, scale_factor)
    imgt_pred = (imgt_pred_1 + imgt_pred_2) / 2.0
    return imgt_pred

styler00dollar commented 2 years ago

I was rather thinking of applying ensembling on the flow rather than the end result, rife does this after every block and merges the flow, but that is also an interesting approach I could test.

As reference of what I was thinking prior:

for i in range(4):
    if flow is None:
        flow, mask = block[i](
            torch.cat((img0[:, :3], img1[:, :3], timestep), 1),
            None,
            scale=scale_list[i],
        )
        if ensemble:
            f1, m1 = block[i](
                torch.cat((img1[:, :3], img0[:, :3], 1 - timestep), 1),
                None,
                scale=scale_list[i],
            )
            flow = (flow + torch.cat((f1[:, 2:4], f1[:, :2]), 1)) / 2

I would need to figure out how to implement something like that for IFRNet, well still thanks.

ltkong218 commented 2 years ago

I think your suggestion is better than what I have done above. Since directly ensembling the final results will cause blurry texture, while ensembling intermediate optical flow does not have this problem. I will add a new function ensemble_inference according to your suggestion later.

styler00dollar commented 2 years ago

Thank you. I will wait for it. :)