Open catalys1 opened 6 days ago
cc @molbap @qubvel
Hi @catalys1, thank you so much for your feature request! We agree that the ability to fine-tune Owl-vit/Owlv2 would be a great addition. If you have the time and are interested in contributing, we would love to collaborate with you on this! Your help would be greatly appreciated 🤗
This PR might be also helpful
Feature request
Currently the Owl-vit models support inference and CLIP-style contrastive pre-training, but don't provide a way to train (or fine-tune) the detection part of the model. According to the paper, detection training is similar to Detr.
Motivation
It would be really awesome to be able to train or fine-tune one of these already-existing open-vocabulary object detection models.
Your contribution
I may be able to help some with this, not sure at present