researchmm / TTVSR

[CVPR'22 Oral] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution
MIT License
200 stars 13 forks source link

Question about Base model in Ablation study #1

Closed wdmwhh closed 2 years ago

wdmwhh commented 2 years ago

Hello, thanks for sharing your great work. I carefully read your ttvsrnet.py. I guess that Base model is de facto BasicVSR. Is it right?

ChengxuLiu commented 2 years ago

Thanks for your question. The base model in Ablation is not BasicVSR. The Base model does not have any feature alignment. In experiments, we found that the dense feature alignment is very useful, so our method also includes the dense trajectories alignment in adjacent frame. This is similar to BasicVSR.

wdmwhh commented 2 years ago

Thanks for your question. The base model in Ablation is not BasicVSR. The Base model does not have any feature alignment. In experiments, we found that the dense feature alignment is very useful, so our method also includes the dense trajectories alignment in adjacent frame. This is similar to BasicVSR.

In your paper (Sec 4.3), it says "integrate the aligned previous tokens and current token as our “Base” model". image

Taken together, aligned or not aligned?

ChengxuLiu commented 2 years ago

Not aligned. Thank you for your careful reading, we will proofread this mistake and update~~

wdmwhh commented 2 years ago

Thanks for your quick reply. Another question on the ablation study of the frame number. I think that #Frame=33 is case of frame_stride=3 as used in the code. And #Frame can be roughly calculated by #Frame = 100 / frame_stride.

  1. But the problem comes when #Frame = 45.
  2. And is #Frame=0 is the case of the Base model? image
ChengxuLiu commented 2 years ago

Thanks for your question.

  1. To maximize the number of frames, we use a maximum of 45 frames for training.
    When #Frame = 45, the number of frames used is not an equal interval. We artificially set the number of frames used to ensure that the total number of frames used is 45, where the interval between the frames used is 1 or 2.
  2. There are different, the frames used in Base model are unaligned by trajectory.
wdmwhh commented 2 years ago

When #Frame=5, most of the frames only take 2~3 frames in the trajectory but achieve a good result. Two more question? (Thanks for your kindness and patience.)

  1. Can out += anchor_feat be dominated by anchor_feat, that is feat_prop ?
  2. Have you tried to use anchor_feat (or feat_prop) only?
ChengxuLiu commented 2 years ago

Thanks for your question.

  1. Yes, it is a kind of dense trajectory alignment, which is the important part to dominate the performance.
  2. It is similar to BasicVSR, so we do not do this experiment. But I think it is certainly a success.
mrluin commented 2 years ago

When #Frame=5, most of the frames only take 2~3 frames in the trajectory but achieve a good result. Two more question? (Thanks for your kindness and patience.)

  1. Can out += anchor_feat be dominated by anchor_feat, that is feat_prop ?
  2. Have you tried to use anchor_feat (or feat_prop) only?

Sorry to make a discussion under the closed issue.

Although I check the warped img (by grid_flow) in RGB space, I see that the warped img suffers from severe grid artifacts. I also notice that at the end of LTAM module anchor_feat is added back to the aggregated result, which is bilinear warped feature. So I also think anchor_feat makes an important role.

Cause the grid_flow warped image seems not robust, could you give me some hint about the effectiveness of the grid_flow warped feature?

Looking forward to your reply! Thanks in advance~

Best regards, TTB.

ChengxuLiu commented 2 years ago

Thanks for your interest in our work. Sorry for not noticing your discussion in time. If we only use the sparse feature warped by grid_flow to reconstruction, but this kind of trajectory is so sparse that can not get great performance. As you found in the RGB space, farther temporal utilization inevitably leads to sparse features. The dense feature also makes an important role in VSR. So we add the _anchorfeat back to the reconstruction. It is a kind of dense feature based on dense trajectories alignment. The detailed implement can be found in our paper.