Closed pspdada closed 4 days ago
It is not maintained anymore. Please try newer llava v1.6/ llava next /llava onevision. https://github.com/sgl-project/sglang/blob/main/examples/frontend_language/quick_start/local_example_llava_next.py https://github.com/sgl-project/sglang/tree/main/examples/runtime/llava_onevision https://github.com/sgl-project/sglang/blob/731146f6cbec40f502e16dc971a150ed46b207ad/test/srt/test_vision_openai_server.py#L31
Checklist
Describe the bug
I want to follow
https://github.com/sgl-project/sglang/blob/main/benchmark/llava_bench/README.md
and perform batch inference on llava. First I launch allava v1.5 7b model
from the local path using:python3 -m sglang.launch_server --model-path /root/llm-project/utils/models/models-repo/llava-v1.5-7b --tokenizer-path llava-hf/llava-1.5-7b-hf --port 30000 --disable-cuda-graph
Every thing look fine:Then I run the llava_bench using
python3 bench_sglang.py --num-questions 60
, an error occured: The server side:The python code (running bench_sglang.py) side:
Reproduction
Follow
https://github.com/sgl-project/sglang/blob/main/benchmark/llava_bench/README.md
Environment