Is unsloth support PEFT Finetuning for Vision Foundation Models like OWL-ViT for Zero Shot Object Detection?

unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

16.46k stars 1.14k forks source link

Is unsloth support PEFT Finetuning for Vision Foundation Models like OWL-ViT for Zero Shot Object Detection? #153

Open solomonmanuelraj opened 8 months ago

solomonmanuelraj commented 8 months ago

Hi team,

Like to fine tune the vision foundation models like Owl-vit which is mainly used for zero shot object detection.

Like to know whether unsloth supports this VFM Lora Fine tuning? your reference will be much help.

thanks

danielhanchen commented 8 months ago

@solomonmanuelraj We currently only support language models architectures like Llama (Yi, Deepseek, Qwen etc) and Mistral. So ViT based models no for now. However, modules can obviously be brought over to vision transformers, but this will be for a future project.

solomonmanuelraj commented 8 months ago

thanks for your quick response.

danielhanchen commented 8 months ago

@solomonmanuelraj If you're not already on our Discord https://discord.gg/u54VK8m8tk, there are some other people who are also working on finetuning vision models :) Maybe you all can discuss about vision models :))