Open markkofler opened 1 week ago
@markkofler vLLM doesn't have support for Qwen2 embedding models yet. I have a WIP PR here for this: https://github.com/vllm-project/vllm/pull/5611
From the logs I see the model is not running as an embedding model WARNING 06-25 14:23:04 serving_embedding.py:141] embedding_mode is False. Embedding API will not work.
Looking at the config file for the model, it seems that it is registered to run as Qwen2ForCausalLM
, which is not valid for embedding models: https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct/blob/a3b5d14542d49d8e202dcf1c382a692b1607cee5/config.json#L3
You can try changing this to be Qwen2Model
, which is what vLLM expects for embedding models, and run it with my PR (which you would have to build from source).
Your current environment
Using latest available docker image: vllm/vllm-openai:v0.5.0.post1
🐛 Describe the bug
I am getting as response "Internal Server Error" when calling the /v1/embeddings endpoint of the Kubernetes-deployed version of the model x. I am using the following json request as body:
For reference, here is the log of the vLLM container:
Would be great if somebody could help me to get the model running as embedding model for our colleagues. Any idea what could be wrong?
Thanks in advance!