[MLC-28] server: added Bert MLX model with conversions for e5 models - Githubissues

mlx-chat / mlx-chat-app

Chat with MLX is a high-performance macOS application that connects your local documents to a personalized large language model (LLM).

MIT License

161 stars 9 forks source link

[MLC-28] server: added Bert MLX model with conversions for e5 models #15

Closed stockeh closed 7 months ago

stockeh commented 7 months ago

Updates

Moved convert to utils.py with functionality for deleting old models after storing locally
Added BertModel for e5-based models with MLX primatives, e.g., intfloat/multilingual-e5-small
Updated E5Embeddings to use MLX primitives and convert model if not loaded

Preliminary Benchark

MLX (bs=1): Indexed 1553 documents in 9.67s MLX (bs=8): Indexed 1553 documents in 3.75s MLX (bs=32): Indexed 1553 documents in 4.47s

Torch (bs=1): Indexed 1553 documents in 32.15s