Welcome to the MLX_CLIP repository! š This repository contains an implementation of the CLIP (Contrastive Language-Image Pre-training) model using the MLX library. CLIP is a powerful model that learns to associate images with their corresponding textual descriptions, enabling various downstream tasks such as image retrieval and zero-shot classification. š¼ļøš
To get started with MLX_CLIP, follow these steps:
Clone the repository:
git clone https://github.com/harperreed/mlx_clip.git
Install the required dependencies:
pip install -r requirements.txt
Load the pre-trained CLIP model:
from mlx_clip import mlx_clip
model_dir = "path/to/pretrained/model"
clip = mlx_clip(model_dir)
Use the CLIP model for generating image and text embeddings:
image_path = "path/to/image.jpg"
image_embedding = clip.image_encoder(image_path)
text = "A description of the image"
text_embedding = clip.text_encoder(text)
Check out the example.py
file for a simple example of how to use MLX_CLIP to generate image and text embeddings.
MLX_CLIP provides a convenient utility to convert pre-trained CLIP weights from the Hugging Face repository to the MLX format. To convert weights, use the convert_weights
function from mlx_clip.convert
:
from mlx_clip.convert import convert_weights
hf_repo = "openai/clip-vit-base-patch32"
mlx_path = "path/to/save/converted/model"
convert_weights(hf_repo, mlx_path)
Contributions to MLX_CLIP are welcome! If you encounter any issues, have suggestions for improvements, or want to add new features, please open an issue or submit a pull request. Make sure to follow the existing code style and provide appropriate documentation for your changes.
MLX_CLIP is licensed under the MIT License.
MLX_CLIP is heavily based on the mlx-experiments clip implementation. Special thanks to the MLX team for their incredible work!
For any questions or inquiries, feel free to reach out to the project maintainer:
Harper Reed
Happy coding with MLX_CLIP! šš»š