Closed ShuxunoO closed 9 months ago
What an awesome issue! It helps me a lot. Thanks:)
Thanks for this very useful for many I'm sure. I'm goint to move to discussions and pin it there for visibility and longevity.
Also will point out that while csv datasets are convenient, you are likely to run into performance issues if you're scaling a beyond a few processse (GPUs). Especially if it's a network drive or magnetic drive you're reading from. It is possible to build webdataset format tar datasets w/o using img2dataset.
Introduction
OpenClip is widely recognized in the academic and industrial circles as an excellent open-source repository for training Clip series models. However, the documentation lacks detailed explanations on how to fine-tune the CLIP models for downstream tasks using our local datasets, beginners may initially find themselves unsure of where to start. Well, this issue, based on my practical experience, introduces some usage precautions of OpenClip . I hope it can help students who are new to the clip series models.
1. Clone the repository
2. Install environment
Firstly, check your CUDA version before installing torch and the corresponding packages, if we install the dependencies by directly using official command, we are very likely to encounter a series of errors caused by mismatched torch versions and CUDA versions. So install your environment according to the actual situation.
2.1 Check our CUDA version in shell
and we will get the driver version(Using my local device as an example)
then visit torch official web to get the torch and other corresponding packages versions with your CUDA version. It is recommended to use the pip command for installation, for example:
2.2 Check the installation
If the output is “True”,congratulations! You have installed the most important packages successfully! The just install the rest packages using
3. Prepare your local dataset
CLIP uses visual-textual contrastive loss for training, so your local dataset must include both images and their corresponding textual descriptions. Afterwards, you need to create an index file that links the images with their respective captions. In the official tutorial, img2dataset was used for data management because it involved downloading some public datasets. However, for local data, using a CSV file as an index is the most convenient option. The following tutorials will use a CSV file as an example.
The CSV file should contain at least two columns: the image path and its corresponding text description. It is particularly worth noting to remember their headers (e.g. filepath,caption), which will be used later.
4. Chose a suitable pre-trained model
OpenClip official provides quite a lot pre-trained models of the CLIP series for downloading and usage. You can use the following command to view the specific details of these models.
4.1 Understand the models
The first column represents the model’s name, which is also the parameter for text encoding in the model. The second column indicates either the provider of the model or the scale of training dataset used.
4.2 Test your settings
Now test your project settings by the official demo, it will automatically download the required models (remember to replace the “img_path” according to your actual situation).
If for some reason your server cannot directly download these released models through the official OpenClip scripts, you can still download them to your local machine using any possible methods and then upload them to the server. The OpenAI's released resource of the models are as follows:
For more pre-trained resource,please refer to https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/pretrained.py
After you have successfully putted the pre-trained models to your sever, the use the demo to test(remember to replace the “model_path”, “model_name”,“img_path” according to your actual situation)
If you get the corresponding output,congratulations, you have completed all the preparation work!
5. Train your model
If you want to train multiple GPUs on the same server simultaneously, you can use the following command:
for more detailed args explanation,please refer to :https://github.com/mlfoundations/open_clip/blob/main/src/training/params.py
6. Some points to note: