VAE unnecessary VRAM consumption

axel-havard commented 1 year ago

Hi,

I was looking at the current implementation, and was noticing that before every generation you pass all reference images through the VAE as one batch. After a certain amount of references images, that would result in huge amount of VRAM needed I believe. Wouldn't it be better to get the latents for each selected image beforehand, store them either in the RAM or on the drive temporarly, then load them on generation? That way you avoid big batch in the VAE, and you compute the latents only once for a given reference image instead of for each generation.

What do you think ?

https://github.com/sd-fabric/fabric/blob/caaa5831bacefb060d46168372b45e3bac84a3ae/fabric/generator.py#L357C1-L373C14

matrix4767 commented 1 year ago

Moving it to RAM sounds good.

dvruette commented 1 year ago

That sounds like a reasonable tradeoff, feel free to open a PR!

sd-fabric / fabric

VAE unnecessary VRAM consumption #4