Closed voodoohop closed 5 months ago
I have just uploaded the file "ViT-L-14-GmP-ft-TE-only-HF-format.safetensors" to huggingface. https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main It uses the exact same format (naming, dtype / precision) as the ViT-L TE that is wrapped inside e.g. SDXL. So, it should be possible to now wrap that up with the U-Net and VAE and the 2nd, big CLIP-G text encoder, and fine-tune the whole thing (assuming that's what you intend to do). However, I only tested this as working for inference (generating images with SDXL using the above TE instead of standard ViT-L). If you encounter any freak accidents / unexpected glitches, please let me know!
I have just uploaded the file "ViT-L-14-GmP-ft-TE-only-HF-format.safetensors" to huggingface. https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main It uses the exact same format (naming, dtype / precision) as the ViT-L TE that is wrapped inside e.g. SDXL. So, it should be possible to now wrap that up with the U-Net and VAE and the 2nd, big CLIP-G text encoder, and fine-tune the whole thing (assuming that's what you intend to do). However, I only tested this as working for inference (generating images with SDXL using the above TE instead of standard ViT-L). If you encounter any freak accidents / unexpected glitches, please let me know!
Actually it's just for inference at the moment
As the question states. Is it possible to drop in to any Stable Diffusionndiffusers pipeline?