zfu006 / TAP

Official PyTorch implementation of "Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers" in ECCV 2024.
18 stars 3 forks source link

A detail query of the TAP #2

Open Zeudfish opened 2 weeks ago

Zeudfish commented 2 weeks ago

Dear author Thank you for your work, which has inspired me a lot. I find the overall framework idea very interesting, but there is a small detail that I am not very clear about, which leads to some obstacles when I follow your paper to modify my model. Due to the use of fixed image denoising model,its input and output should be the same THWC. However, I found in the diagram of the paper that the input is THWC, but the output is 1HW*C. How does this work

zfu006 commented 2 weeks ago

Hi,

It is a window-based denoiser and T indicates the window size, which means it utilizes the central frame and its adjacent T-1 frames to denoise the central frame. The window will shift across the whole video so finally you will get the restored video

Zeudfish commented 2 weeks ago

Thanks for your answer, but I still don't understand something. When the input is a THWC, the result of an image denoiser should also be a THWC, which is determined by the form of data when training the image denoiser. So I was curious to see why in Figure 2, I observed a sequence of THWC input to the image denoising model, after the last downsampling, I got a tensor of 1HWC.

I am engaged in the research of image denoising tasks, but in the process of using our denoiser, we will encounter the problem of uneven noise intensity in motion scenes, which leads to the introduction of a time domain denoising model. I see the hope of combining the two models from your scheme, so I am very eager to verify your ideas on my own network. I only encountered difficulties in this aspect when I modified my own network.

zfu006 commented 3 days ago

Hi, T in the figure defines the window size, for example, the network takes T frames as input and restores their central frame, so the output size is 1HWC. T doesn't mean the total number of frames, the total number of the whole frames denoted in the paper is N, pls check page four.