Questions about super-resolution

ariel415el commented 1 year ago

Hey, Thanks for uploading the code for this interesting project.

I have questions about the super-resolution capabilities of your approach

If I'm only interested in super resolution, does the patch extraction in FunkNN still needs to be differentiable?
If not, can we say that your approach uses a patch super-resolution to perform image super-resolution?
How long did it take to train the super resolution network on FFHQ-128?
Tanks

AmirEhsan95 commented 1 year ago

Hi,

Thank you for reading our paper and asking your questions.

If I'm only interested in super resolution, does the patch extraction in FunkNN still needs to be differentiable? Although you can use non-differentiable patch extraction if you don't need the spatial derivatives, we noticed that this differentiable patch extraction implemented by the spatial transformer is very efficient and improves the quality of reconstructions compared to non-continuous patch extraction. Moreover, the differentiable patch extraction allows us to learn the size of the receptive field of the extracted patch, leading to significant improvement in the quality of the reconstructions (see the adaptive receptive field implemented in FunkNN from [here](https://github.com/swing-research/FunkNN/blob/main/funknn_model.py#:~:text=alpha1%20%3D%20torch.zeros(1),alpha2.clone().detach()%2C%20requires_grad%3DFalse))). Accordingly, I recommend using the differentiable patch extraction for super-resolution tasks even if you don't need the spatial derivatives.
If not, can we say that your approach uses a patch super-resolution to perform image super-resolution? I don't think so, since we use the patch information around the coordinate $x$ to approximate its intensity alone, not the intensity of all the pixels in the patch. Therefore, as the coordinate $x$ comes from a continuous space, we can reconstruct the image in the continuous space.
How long did it take to train the super resolution network on FFHQ-128? We trained our model over CelebA-HQ with 30000 training samples in the maximum resolution $128 \times 128$. Each epoch takes around 200s over a single Tesla V100 GPU and the network will converge after 50 epochs, but we let it go for 200 epochs. So it usually can be trained in less than 6 hours.

ariel415el commented 1 year ago

Thank you very much for the elaborate answer !

swing-research / FunkNN

Questions about super-resolution #1