liyuantsao / BFSR

The official repository of BFSR: "Boosting Flow-based Generative Super-Resolution Models via Learned Prior" [CVPR 2024]
69 stars 4 forks source link

How do you obtain the Fig.2 in the paper #5

Closed wyf0912 closed 5 months ago

wyf0912 commented 5 months ago

Thanks for your great work. I have the following doubt after reading the paper

How do you obtain the Fig.2 in the paper? It seems that you want to illustrate the "best temperature" differs in different positions. However, the proposed method does not involve randomness if I understand correctly. Why not predict the temperature instead, e.g., using VAE?

liyuantsao commented 5 months ago

Hi @wyf0912,

Thank you for your insightful questions!

1. How did we obtain the Fig. 2?

We adapted the derivation of the “Optimal Objective Estimation” in SROOE to generate our “best temperature map”.

Specifically, for each image, we generated a total of 21 outcomes using an LINF model, with sampling temperatures ranging from 0 to 1 at intervals of 0.05. Then, for every pixel in each image, we computed the LPIPS values and selected the temperature that yielded the optimal LPIPS (i.e., argmin) from these 21 images.

2. Why didn't we choose to predict the best temperatures in different positions like a VAE and retain randomness in our framework??

The objective of Fig. 2 is to demonstrate that a fixed sampling temperature, as adopted in existing flow-based frameworks, indeed leads to suboptimal results. However, we didn't choose to use a predicted temperature instead because the "grid artifacts" and "exploding inverse" issues mentioned in our paper are primarily triggered by the random sampling scheme, especially when adopting higher sampling temperatures (e.g., 0.9).

Firstly, "grid artifacts" are more evident in images generated by LINF, as it samples each patch independently and then assembles these non-overlapping patches. When adopting a higher temperature, the sampled values between neighboring patches might differ significantly (e.g., one samples extreme values and the other samples values all close to the mean), resulting in diverse effects of generated patches which leads to discontinuities between them. Therefore, even if we predict the best temperature precisely and enhance the overall quality of an image, this artifact still exists (as seen in the image generated by $\tau^*$ in Fig. 2).

Secondly, as shown in Table 3, there is a trade-off between visual quality (LPIPS value) and the probability of generating an exploding inverse, controlled by the sampling temperature. If we don't impose constraints on the maximum value of predicted temperatures, the exploding inverse issue might still persist. However, such constraints might compromise performance.

Summary In this work, we emphasize effectively tackling these issues with a concise solution, also demonstrating that flow-based SR methods have the potential to compete with other generative SR methods. However, we have discussed, and I personally agree that a VAE-like architecture is worth developing for our proposed framework, as removing randomness from a generative model might be controversial.

Thank you for your precious advice!

wyf0912 commented 5 months ago

Thanks a lot for your prompt and detailed explanation!