Open yk287 opened 5 months ago
Can you share some more of the error, it's not clear where the error came from from the snippet you provided
I encountered same problem. vllm runs without problem with this command:
vllm serve neuralmagic/Mistral-Nemo-Instruct-2407-FP8 --api-key ubestream --max-model-len 1024 --gpu-memory-utilization 0.9 --max-num-seqs 64 --max-num-batched-tokens 8192 --block-size 16 --enable-prefix-caching --enforce-eager
However, by changing the --block-size to 8, the same error occured:
vllm serve neuralmagic/Mistral-Nemo-Instruct-2407-FP8 --api-key ubestream --max-model-len 1024 --gpu-memory-utilization 0.9 --max-num-seqs 64 --max-num-batched-tokens 8192 --block-size 8 --enable-prefix-caching --enforce-eager
Error: subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp91qfilw7/main.c', '-O3', '-I/home/server9/venv_vllm/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmp91qfilw7', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp91qfilw7/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/x86_64-linux-gnu']' returned non-zero exit status 1.
And the error cause was:
/tmp/tmp91qfilw7/main.c:5:10: fatal error: Python.h: No such file or directory
5 | #include ~~~~~
compilation terminated.
By installing the python3-dev package the problem was solved in my case.
FYI
Your current environment
🐛 Describe the bug
I'm trying to serve Mixtral-8x7B in my environment, using the following code
I get the following error message
Can someone tell me how I might be able to resolve this issue ?
Thanks!