Open gottfriedm opened 10 months ago
Upscaling latent directly results in a terrible quality degradation. Upscaling latent is akin to scattering noise, and we cannot control the intensity of that noise.
Technologies like NNLatentUpscale have emerged recently to mitigate some quality degradation, but they are not perfect solutions.
The intention behind "IterativeUpscale(latent)" is actually designed for the ease of use within the latent input/output interface, rather than direct upscaling through latent.
However, there has been consideration for developing an Upscaler Provider based on NNLatentUpscale for quite some time.
Upscaling latent directly results in a terrible quality degradation.
Are you sure about that? ComfyUI's "hires fix" does exactly that, and it works just fine. Of course you can't just directly decode the upscaled latent samples, that would indeed not work very nicely i guess. But we are using the upscaled latents just as an input for more sampling, so i don't see why the decode->upscale->encode should be necessary.
Upscaling latent directly results in a terrible quality degradation.
Are you sure about that? ComfyUI's "hires fix" does exactly that, and it works just fine. Of course you can't just directly decode the upscaled latent samples, that would indeed not work very nicely i guess. But we are using the upscaled latents just as an input for more sampling, so i don't see why the decode->upscale->encode should be necessary.
I also understand what the hires-fix method is. And this was specifically designed to address and avoid the flaws of hires-fix accurately.
The hires-fix method introduces strong noise through latent upscale, and enhancing details by significantly transforming the image through strong denoising.
The Iterative Upscale on pixel space approach is designed to minimize damage during upscale for mild denoising purposes. If you want to achieve the desired amount of detail enhancement, you can adds artificial noise as needed through pk_hook.
I see, thanks for the explanation. If i understand correctly, this approach is necessary when working with a very low denoising strength (<0.3) in the upscale. In that case upscaling the latents directly would indeed create too much artefacts which cannot be repaired in the sampling due to the low denoise.
For denoising strength of ~0.4 and more, i don't really see artefacts anymore when upscaling the latents directly.
Maybe it would make sense to rename the node, in order to better reflect what it is actually doing? The current name is a bit misleading because it suggests that the upscaling is done in latent space, when in fact it is done in image space.
I see, thanks for the explanation. If i understand correctly, this approach is necessary when working with a very low denoising strength (<0.3) in the upscale. In that case upscaling the latents directly would indeed create too much artefacts which cannot be repaired in the sampling due to the low denoise.
For denoising strength of ~0.4 and more, i don't really see artefacts anymore when upscaling the latents directly.
Maybe it would make sense to rename the node, in order to better reflect what it is actually doing? The current name is a bit misleading because it suggests that the upscaling is done in latent space, when in fact it is done in image space.
I also feel that the current name might be inappropriate.
How about an additional parameter that is used as a threshold to decide below which denoising value the upsampling is done in pixel space, and otherwise upsampling is done in latent space? It can be set to 1 by default, so the current behavior is retained. But users could decide to set it lower, since a denoise value of 0.5 still produces quite good results (somtetimes subjectively sharper) when upsampling latent directly.
That way the node would implement both functionalities and expose an easy control for the user.
If support for upscaling in the latent space , It will be applied through a different method, such as a hook or an upscaler node.
It seems most folks are just hung up on the name. You could call it "upscale loopback" or something to get the pedants off your back. Are they picking on the other 100 oddball names in Comfy, too? lol
The Iterative Latent Upscale utilizes the function
latent_upscale_on_pixel_space
, which does the following:Why is the scaling not done in latent space? This would eliminate the decode/encode step, and would better mimic what is usually done with the standard comfyUI nodes ("highres-fix"). The standard nodes
LatentUpscale
andLatentUpscaleBy
apply the functiontorch.nn.functional.interpolate()
directly on the samples in latent space, as the node name suggests.