Closed eduardozamudio closed 3 months ago
@eduardozamudio can you please share the version of vLLM used and the error of why the model won't load? It is a MistralForCausalLM
, so I would expect it to run as Mistral models do.
Hi @eduardozamudio (Hola Eduardo!)
mistralai/Codestral-22B-v0.1
worked for me using vllm 0.4.3.
Regards Matias
Hi @eduardozamudio (Hola Eduardo!)
mistralai/Codestral-22B-v0.1
worked for me using vllm 0.4.3.Regards Matias
Does it work for "fill in the middle" https://huggingface.co/mistralai/Codestral-22B-v0.1#fill-in-the-middle-fim? I image there would be some work required to support the prefix and suffix params both in the rest API and core APIs...
I haven't dug into the https://github.com/mistralai/mistral-inference code yet, but I think it's just uses special tokens to mark prefix, suffix and middle, so it can probably also be implemented outside of vllm and just passed as the normal input...
@eduardozamudio can you please share the version of vLLM used and the error of why the model won't load? It is a
MistralForCausalLM
, so I would expect it to run as Mistral models do.
I've updated to v0.4.3 and still getting the error.
The model to consider.
Hi. Could you add support to mistralai/Codestral-22B-v0.1?
Thanks!
The closest model vllm already supports.
https://huggingface.co/meta-llama/CodeLlama-7b-hf https://huggingface.co/mistralai/Mistral-7B-v0.3
What's your difficulty of supporting the model you want?
Can't load Codestral-22B-v0.1 using OpenAI server API
ORG="mistralai" MODEL="Codestral-22B-v0.1" API_KEY=XXXXXXXXXXXXXXXXXXXXXX python -m vllm.entrypoints.openai.api_server \ --tokenizer $ORG/$MODEL \ --model $ORG/$MODEL \ --served-model-name $MODEL \ --tensor-parallel-size 4 \ --gpu-memory-utilization 0.9 \ --max-model-len 4096 \ --enforce-eager \ --api-key $API_KEY
Here is the output. Could it be a dependency problem?
[rank0]: Traceback (most recent call last):
[rank0]: File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]: File "<frozen runpy>", line 88, in _run_code
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/entrypoints/openai/api_server.py", line 196, in <module>
[rank0]: engine = AsyncLLMEngine.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/engine/async_llm_engine.py", line 395, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/engine/async_llm_engine.py", line 349, in __init__
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/engine/async_llm_engine.py", line 470, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/engine/llm_engine.py", line 235, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/engine/llm_engine.py", line 312, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/executor/distributed_gpu_executor.py", line 38, in determine_num_available_blocks
[rank0]: num_blocks = self._run_workers("determine_num_available_blocks", )
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/executor/ray_gpu_executor.py", line 246, in _run_workers
[rank0]: driver_worker_output = self.driver_worker.execute_method(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/worker/worker_base.py", line 149, in execute_method
[rank0]: raise e
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/worker/worker_base.py", line 140, in execute_method
[rank0]: return executor(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/worker/worker.py", line 154, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/worker/model_runner.py", line 833, in profile_run
[rank0]: self.execute_model(seqs, kv_caches)
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/worker/model_runner.py", line 738, in execute_model
[rank0]: hidden_states = model_executable(
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/model_executor/models/llama.py", line 371, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/model_executor/models/llama.py", line 288, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: ^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/model_executor/models/llama.py", line 223, in forward
[rank0]: hidden_states = self.input_layernorm(hidden_states)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]: return self._forward_method(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/model_executor/layers/layernorm.py", line 62, in forward_cuda
[rank0]: ops.rms_norm(
[rank0]: File "/home/jovyan/ezamudio/vllm/vllm/_custom_ops.py", line 132, in rms_norm
[rank0]: torch.ops._C.rms_norm(out, input, weight, epsilon)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]: raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'
@eduardozamudio it seems like you have built vllm from source, so it is possible your environment has not built the library correctly. Could you try installing the pre-built package from pypi to confirm you don't see this issue on the actual release?
@eduardozamudio it seems like you have built vllm from source, so it is possible your environment has not built the library correctly. Could you try installing the pre-built package from pypi to confirm you don't see this issue on the actual release?
Excellent!
I confirm that the issue is not present anymore using the pre-built package.
Thanks @mgoin!
The model to consider.
Hi. Could you add support to mistralai/Codestral-22B-v0.1?
Thanks!
The closest model vllm already supports.
https://huggingface.co/meta-llama/CodeLlama-7b-hf https://huggingface.co/mistralai/Mistral-7B-v0.3
What's your difficulty of supporting the model you want?
Can't load Codestral-22B-v0.1 using OpenAI server API