Closed clavar closed 4 weeks ago
It wouldn't save VRAM as it's cast to fp16 anyway, also it's not used during sampling/decoding. It would save disk space though, and I agree the clip vision stuff could be handled better, for now it just uses what the original code does.
Just did a bigger update that among other things, uses ComfyUI clip_vision model instead. It did end up saving memory, probably due to better memory management when using the Comfy loader instead.
Its used in ipadapter-plus, saving some vram.
CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors