Closed parlance-zz closed 2 years ago
This doesn't actually work the way people think it does. There's actually two problems:
A) The output after 24 steps of a 24 step generation is nothing like the output after 24 steps of a 48 step generation B) Taking intermediate outputs affects future results
However there are solutions I've seen floating around for this (specifically a custom GAN to produce previews). I'll add it to the backlog.
I'm not sure why, but does decoding the latent space representation really cause non-determinism? That's bizarre.
B) Taking intermediate outputs affects future results
I'm with parlance on this one; that would be very bizarre indeed.
:firecracker: diffusers 0.4.0 did add a callback
to the pipeline to expose the latents during the process.
I'm going from memory on B, haven't confirmed it. My guess was VAE draws from the generator, affecting noise. If it doesn't, then yay :).
As far as how to implement, it'll need a little capability negotiation to not break clients that don't expect it, but we can easily just return more Answers than the Request specified samples. If we give all the images the same answer_id (or maybe a formatted answer_id like main_id:sub_id
) it'll be easy to track them as intermediate outputs.
The CompVis code exposed a parameter called sampling interval that would write intermediate (decoded) results during the sampling iteration process. I think this could be useful for a variety of reasons so it might be nice to expose.
It might also be the best way to handle the general idea of getting debug images, which would need to be gathered at some step interval in the sampling process anyway.