hafriedlander / stable-diffusion-grpcserver

An implementation of a server for the Stability AI Stable Diffusion API
Apache License 2.0
172 stars 23 forks source link

debug sampling interval parameter #7

Closed parlance-zz closed 2 years ago

parlance-zz commented 2 years ago

The CompVis code exposed a parameter called sampling interval that would write intermediate (decoded) results during the sampling iteration process. I think this could be useful for a variety of reasons so it might be nice to expose.

It might also be the best way to handle the general idea of getting debug images, which would need to be gathered at some step interval in the sampling process anyway.

hafriedlander commented 2 years ago

This doesn't actually work the way people think it does. There's actually two problems:

A) The output after 24 steps of a 24 step generation is nothing like the output after 24 steps of a 48 step generation B) Taking intermediate outputs affects future results

However there are solutions I've seen floating around for this (specifically a custom GAN to produce previews). I'll add it to the backlog.

parlance-zz commented 2 years ago

I'm not sure why, but does decoding the latent space representation really cause non-determinism? That's bizarre.

keturn commented 2 years ago

B) Taking intermediate outputs affects future results

I'm with parlance on this one; that would be very bizarre indeed.

:firecracker: diffusers 0.4.0 did add a callback to the pipeline to expose the latents during the process.

hafriedlander commented 2 years ago

I'm going from memory on B, haven't confirmed it. My guess was VAE draws from the generator, affecting noise. If it doesn't, then yay :).

hafriedlander commented 2 years ago

As far as how to implement, it'll need a little capability negotiation to not break clients that don't expect it, but we can easily just return more Answers than the Request specified samples. If we give all the images the same answer_id (or maybe a formatted answer_id like main_id:sub_id) it'll be easy to track them as intermediate outputs.