michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.27k stars 89 forks source link

minicpm3-embedding and minicpm3-reranker #354

Open lyj157175 opened 1 week ago

lyj157175 commented 1 week ago

Model description

minicpm3's embedding and reranker models can support?

Open source status

Provide useful links for the implementation

No response

michaelfeil commented 1 week ago

@lyj157175 Which models (huggingface link) are you talking about?

michaelfeil commented 1 week ago

I think https://huggingface.co/openbmb/MiniCPM-V-2 is a text-generation model (chat). For multi-model chat, I would recommend using https://github.com/vllm-project/vllm . This repo is for multi-model embeddings and reranking.

lyj157175 commented 1 week ago

I mean these two models. These two are embedding and reranker models. Can vllm only load the chat model? https://huggingface.co/openbmb/MiniCPM-Embedding https://huggingface.co/openbmb/MiniCPM-Reranker

michaelfeil commented 1 week ago

Yes, via pip install infinity_emb[all] flash-attn