hzwer / Practical-RIFE

We are developing more practical frame interpolation approach.
MIT License
542 stars 60 forks source link

Practical RIFE vs ECCV2022 RIFE teacher differences #64

Closed niqodea closed 5 months ago

niqodea commented 5 months ago

Hello RIFE authors! Thank you for sharing your training code, very helpful.

I am trying to better understand the training code of Practical RIFE (here the link to v4.15) and I am struggling to understand the changes to the teacher compared to the original paper.

From the paper we read:

[...] we design a privileged distillation scheme that employs a teacher model with access to the intermediate frames to guide the student to learn.

In the original repository, it is pretty straightforward to observe this in the code: link

However, in the new training code, we have this snippet instead, where gt is never used:

if gt.shape[1] == 3:
    flow_teacher = 0
    mask_teacher = 0
    for i in range(4):
        flow_teacher += conf[:, i:i+1] * flow_list[i]
        mask_teacher += conf[:, i:i+1] * mask_list[i]
    warped_img0_teacher = warp(img0, flow_teacher[:, :2])
    warped_img1_teacher = warp(img1, flow_teacher[:, 2:4])
    mask_teacher = torch.sigmoid(mask_teacher)
    merged_teacher = warped_img0_teacher * mask_teacher + warped_img1_teacher * (1 - mask_teacher)
    teacher_list.append(merged_teacher)
    flow_list_teacher.append(flow_teacher)

It seems to me that the teacher no longer has the privilege to access the intermediate frame when computing the flows. Instead, it seems to leverage the conf tensor to refine the flows of the student. The ground truth does not seem to be leveraged when computing this conf tensor.

Am I missing something, or is this teacher no longer privileged as before? Also, what is the meaning of the conf tensor?

Also, what is the meaning of the conf tensor? Why is it

hzwer commented 5 months ago

Hi, I'm worried about a new problem when using privileged distillation, that is, flow_teacher will model part of the noise. IFRNet seems to have a solution to assign mask. We further found that as long as we can construct a stronger teacher so that each previous student block can be supervised, we can get most of the benefits. The teacher of practical_RIFE is actually the conf weighted fusion of the results of each student block. The advantage of this is that there is no extra calculation required for teacher.

nicodea commented 5 months ago

I see, thank you for the response. What does conf stand for? Is it configuration?

Moreover, does that mean that the teacher is no longer privileged and can thus be used at test time as well? What would be the reason to not use the teacher at inference time instead of the student at this point?

hzwer commented 5 months ago
  1. confidence, a self-predicted weight
  2. I want to keep model inference simple
nicodea commented 5 months ago

Got it. Thanks a lot!