odd vae decoding behaviour when using fp16 flux +fp16 t5xxl

when using fp16 flux + fp16 t5xxl, generating works just fine but when it reaches the vae decoding stage, it allocates like 10 gb of my ram, overflowing the ram into the ssd and making the pc super unresponsive just for the vae and i have no idea why, the vae is like 200 mb it should have no problems.

also after the vae has finally decoded all models get unloaded from ram and vram, making it so i have to load the models again for generating another image.

i do not believe this to be a memory problem since i have rtx 4090 and 32 gb of ram, and generating speed is perfectly fine.

with the exact same setup except with nf4 flux instead of fp16 everything works perfectly fine.

lllyasviel / stable-diffusion-webui-forge

odd vae decoding behaviour when using fp16 flux +fp16 t5xxl #1768