ltdrdata / ComfyUI-Impact-Pack

Custom nodes pack for ComfyUI This custom node helps to conveniently enhance images through Detector, Detailer, Upscaler, Pipe, and more.
GNU General Public License v3.0
1.83k stars 175 forks source link

Iterative Latent Upscale should scale the latent, not the image #390

Open gottfriedm opened 10 months ago

gottfriedm commented 10 months ago

The Iterative Latent Upscale utilizes the function latent_upscale_on_pixel_space, which does the following:

  1. decode the input latent into pixel space
  2. scale the pixels
  3. encode the scaled pixels back into latent space

Why is the scaling not done in latent space? This would eliminate the decode/encode step, and would better mimic what is usually done with the standard comfyUI nodes ("highres-fix"). The standard nodes LatentUpscale and LatentUpscaleBy apply the function torch.nn.functional.interpolate() directly on the samples in latent space, as the node name suggests.

ltdrdata commented 10 months ago

Upscaling latent directly results in a terrible quality degradation. Upscaling latent is akin to scattering noise, and we cannot control the intensity of that noise.

Technologies like NNLatentUpscale have emerged recently to mitigate some quality degradation, but they are not perfect solutions.

The intention behind "IterativeUpscale(latent)" is actually designed for the ease of use within the latent input/output interface, rather than direct upscaling through latent.

However, there has been consideration for developing an Upscaler Provider based on NNLatentUpscale for quite some time.

gottfriedm commented 10 months ago

Upscaling latent directly results in a terrible quality degradation.

Are you sure about that? ComfyUI's "hires fix" does exactly that, and it works just fine. Of course you can't just directly decode the upscaled latent samples, that would indeed not work very nicely i guess. But we are using the upscaled latents just as an input for more sampling, so i don't see why the decode->upscale->encode should be necessary.

ltdrdata commented 10 months ago

Upscaling latent directly results in a terrible quality degradation.

Are you sure about that? ComfyUI's "hires fix" does exactly that, and it works just fine. Of course you can't just directly decode the upscaled latent samples, that would indeed not work very nicely i guess. But we are using the upscaled latents just as an input for more sampling, so i don't see why the decode->upscale->encode should be necessary.

I also understand what the hires-fix method is. And this was specifically designed to address and avoid the flaws of hires-fix accurately.

The hires-fix method introduces strong noise through latent upscale, and enhancing details by significantly transforming the image through strong denoising.

The Iterative Upscale on pixel space approach is designed to minimize damage during upscale for mild denoising purposes. If you want to achieve the desired amount of detail enhancement, you can adds artificial noise as needed through pk_hook.

gottfriedm commented 10 months ago

I see, thanks for the explanation. If i understand correctly, this approach is necessary when working with a very low denoising strength (<0.3) in the upscale. In that case upscaling the latents directly would indeed create too much artefacts which cannot be repaired in the sampling due to the low denoise.

For denoising strength of ~0.4 and more, i don't really see artefacts anymore when upscaling the latents directly.

Maybe it would make sense to rename the node, in order to better reflect what it is actually doing? The current name is a bit misleading because it suggests that the upscaling is done in latent space, when in fact it is done in image space.

ltdrdata commented 10 months ago

I see, thanks for the explanation. If i understand correctly, this approach is necessary when working with a very low denoising strength (<0.3) in the upscale. In that case upscaling the latents directly would indeed create too much artefacts which cannot be repaired in the sampling due to the low denoise.

For denoising strength of ~0.4 and more, i don't really see artefacts anymore when upscaling the latents directly.

Maybe it would make sense to rename the node, in order to better reflect what it is actually doing? The current name is a bit misleading because it suggests that the upscaling is done in latent space, when in fact it is done in image space.

I also feel that the current name might be inappropriate.

gottfriedm commented 10 months ago

How about an additional parameter that is used as a threshold to decide below which denoising value the upsampling is done in pixel space, and otherwise upsampling is done in latent space? It can be set to 1 by default, so the current behavior is retained. But users could decide to set it lower, since a denoise value of 0.5 still produces quite good results (somtetimes subjectively sharper) when upsampling latent directly.

That way the node would implement both functionalities and expose an easy control for the user.

ltdrdata commented 10 months ago

If support for upscaling in the latent space , It will be applied through a different method, such as a hook or an upscaler node.

zaqhack commented 10 months ago

It seems most folks are just hung up on the name. You could call it "upscale loopback" or something to get the pedants off your back. Are they picking on the other 100 oddball names in Comfy, too? lol