mehdidc / feed_forward_vqgan_clip

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
MIT License
136 stars 18 forks source link

Finetuing CLIP to improve domain-specific performance #13

Open afiaka87 opened 3 years ago

afiaka87 commented 3 years ago

It's quite easy to finetune one of the Open AI CLIP checkpoints with this codebase:

https://github.com/Zasder3/train-CLIP-FT

Uses pytorch-lightning. May be worth pursuing

mehdidc commented 2 years ago

I would also be curious to see if training/fine-tuning CLIP on a higher resolution (e.g. 512x512 instead of 224) would also lead to better image quality for higher resolutions( >= 512)