Closed ErinZhang1998 closed 2 years ago
Hi Erin,
To clarify, do you have image-caption pairs that you hope to fine-tune on?
Or do you want to fine-tune in a supervised setting (i.e., you have images and class labels for the images).
If you have image-caption pairs then the following is actually the right repository: https://github.com/mlfoundations/open_clip.
If it's the latter then you can add a dataset in src/datasets
-- see existing datasets there as an example. Then, after modifying. src/dataset/__init__.py
you can call --train-dataset=<your dataset name>
.
Please let me know if this is helpful and if not, is there any way we can help.
Thank you so much for the detailed response! It is actually the former, I have an image-caption dataset. But it seems like from this section: open_clip fine-tuning, it asks people to use WiSE-FT?
Hi @ErinZhang1998, that pointer in the open_clip repo is meant for fine-tuning on classification tasks, as Mitchell pointed out. I updated the readme of the open_clip repo so it's clearer!
Hi @mitchellnw & @gabrielilharco, I have total 5k images with 1k images in each of 5 classes. I want to fine tune this model on my custom dataset. Can you please help here ?
Also as @mitchellnw pointed out, this is currently on open_clip
repo. It still points to this repo.
This repository is focused on training CLIP models. To fine-tune a trained zero-shot model on a downstream classification task such as ImageNet, please see [our other repository: WiSE-FT](https://github.com/mlfoundations/wise-ft)
Hi @adesgautam, if you have a classification dataset, this is the right place! @mitchellnw's pointers above are a great starting point, but we are happy to help in case you have any questions.
Hi,
I was wondering where to get started if I want to use this to finetune clip on my own dataset (a dataset of sketch-text pairs)?