vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.91k stars 3.95k forks source link

[Feature]: Does VLLM only support MistralModel Architecture for embedding? #7915

Open hahmad2008 opened 3 weeks ago

hahmad2008 commented 3 weeks ago

🚀 The feature, motivation and pitch

Does VLLM only support MistralModel Architecture for embedding?

_EMBEDDING_MODELS = {
    "MistralModel": ("llama_embedding", "LlamaEmbeddingModel"),
}

I tried to force embedding_mode to true model_config.embedding_mode = True, this error raised:

Activating the server engine with embedding enabled.
INFO 08-27 14:54:06 async_llm_engine.py:173] Added request embd-69a08211c22a4db9baa14c2da3db9dcd-0.
ERROR 08-27 14:54:06 async_llm_engine.py:56] Engine background task failed
ERROR 08-27 14:54:06 async_llm_engine.py:56] Traceback (most recent call last):
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 46, in _log_task_completion
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return_value = task.result()
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 637, in run_engine_loop
ERROR 08-27 14:54:06 async_llm_engine.py:56]     result = task.result()
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 578, in engine_step
ERROR 08-27 14:54:06 async_llm_engine.py:56]     request_outputs = await self.engine.step.remote()  # type: ignore
ERROR 08-27 14:54:06 async_llm_engine.py:56] ray.exceptions.RayTaskError(AttributeError): ray::_AsyncLLMEngine.step() (pid=40485, ip=10.5.8.112, actor_id=da65e597172ea5f7dea0a8b601000000, repr=<vllm.engine.async_llm_engine._AsyncLLMEngine object at 0x7fd4c6e27250>)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/concurrent/futures/_base.py", line 439, in result
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return self.__get_result()
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
ERROR 08-27 14:54:06 async_llm_engine.py:56]     raise self._exception
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 911, in step
ERROR 08-27 14:54:06 async_llm_engine.py:56]     output = self.model_executor.execute_model(
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/executor/ray_gpu_executor.py", line 273, in execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return super().execute_model(execute_model_req)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 76, in execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/executor/ray_gpu_executor.py", line 266, in _driver_execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return self.driver_worker.execute_method("execute_model",
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 383, in execute_method
ERROR 08-27 14:54:06 async_llm_engine.py:56]     raise e
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 374, in execute_method
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return executor(*args, **kwargs)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 236, in execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     self.model_runner.prepare_model_input(
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1227, in prepare_model_input
ERROR 08-27 14:54:06 async_llm_engine.py:56]     sampling_metadata = SamplingMetadata.prepare(seq_group_metadata_list,
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/model_executor/sampling_metadata.py", line 126, in prepare
ERROR 08-27 14:54:06 async_llm_engine.py:56]     ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens,
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/model_executor/sampling_metadata.py", line 218, in _prepare_seq_groups
ERROR 08-27 14:54:06 async_llm_engine.py:56]     if sampling_params.seed is not None:
ERROR 08-27 14:54:06 async_llm_engine.py:56] AttributeError: 'NoneType' object has no attribute 'seed'
mgoin commented 3 weeks ago

Yes, support for other models for embedding use needs to be added.

hahmad2008 commented 3 weeks ago

@mgoin does it need implementation? or it has the same implementation with the current code? Also can I use the Meta-Llama-3-8B-Instructfor embedding? or only those models that are designed to be embedding models?