miaoshouai / ComfyUI-Miaoshouai-Tagger

MIT License
269 stars 15 forks source link

Finetuning Microsoft Phi 3.5 Vision #12

Open alexisrolland opened 2 months ago

alexisrolland commented 2 months ago

I find Microsoft's Phi 3.5 vision instruct performs much better than Florence 2. Since it's an instruct model, it also has the benefit of taking text instruction as input to help describing the images with the desired syntax.

Since you already have a dataset, maybe it could be interesting to finetune this model too 😀

https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Just sharing the idea! Thank you for sharing your work <3

miaoshouai commented 2 months ago

mark~ something down the road to check out