the way I understand it, disco supports multi-clip by picking a different perceptor from its active pool each step, so instead of throwing them all at the step, you just throw one. or maybe they do a 1/3rd power gradient update for each perceptor?
anyway, either of those approaches -- or both -- could be used to incorporate multiple VQGANs as well, or depth models, or whatever.
I wonder if I could alternate stepping a diffusion denoising with a prediction from a VQGAN guided in parallel via the same prompt etc. I guess the issue here is the reduced diversity, huh. which is sorta the issue with vqgan in general. goddamn it I really need to add diffusion.
but yeah. alternative version of multi-clip, then experiment with multi-VQGAN. (aww shit, maybe instead of keeping it strictly at 50:50 infulence, I could add a parameter for each pixel and fit that with the image starting with a really tiny learning rate, as a learned lerping weight for that pixel (i.e. learned model responsibilities)
the way I understand it, disco supports multi-clip by picking a different perceptor from its active pool each step, so instead of throwing them all at the step, you just throw one. or maybe they do a 1/3rd power gradient update for each perceptor?
anyway, either of those approaches -- or both -- could be used to incorporate multiple VQGANs as well, or depth models, or whatever.
I wonder if I could alternate stepping a diffusion denoising with a prediction from a VQGAN guided in parallel via the same prompt etc. I guess the issue here is the reduced diversity, huh. which is sorta the issue with vqgan in general. goddamn it I really need to add diffusion.
but yeah. alternative version of multi-clip, then experiment with multi-VQGAN. (aww shit, maybe instead of keeping it strictly at 50:50 infulence, I could add a parameter for each pixel and fit that with the image starting with a really tiny learning rate, as a learned lerping weight for that pixel (i.e. learned model responsibilities)