I was looking at the current implementation, and was noticing that before every generation you pass all reference images through the VAE as one batch. After a certain amount of references images, that would result in huge amount of VRAM needed I believe.
Wouldn't it be better to get the latents for each selected image beforehand, store them either in the RAM or on the drive temporarly, then load them on generation?
That way you avoid big batch in the VAE, and you compute the latents only once for a given reference image instead of for each generation.
Hi,
I was looking at the current implementation, and was noticing that before every generation you pass all reference images through the VAE as one batch. After a certain amount of references images, that would result in huge amount of VRAM needed I believe. Wouldn't it be better to get the latents for each selected image beforehand, store them either in the RAM or on the drive temporarly, then load them on generation? That way you avoid big batch in the VAE, and you compute the latents only once for a given reference image instead of for each generation.
What do you think ?
https://github.com/sd-fabric/fabric/blob/caaa5831bacefb060d46168372b45e3bac84a3ae/fabric/generator.py#L357C1-L373C14