[Bug] Run llava 1.5 backend get an error

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
[x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
[X] 5. Please use English, otherwise it will be closed.

Describe the bug

I'm using the latest version of sglang, I ran the example in this file, but it resulted in an error: ./benchmark/llava_bench/README.md, but it resulted in an error:

[2024-11-10 22:07:08] server_args=ServerArgs(model_path='liuhaotian/llava-v1.6-vicuna-7b', tokenizer_path='llava-hf/llava-1.5-7b-hf', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='liuhaotian/llava-v1.6-vicuna-7b', chat_template=None, is_embedding=False, host='127.0.0.1', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, tp_size=1, stream_interval=1, random_seed=8309160, constrained_json_whitespace_pattern=None, decode_log_interval=40, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, watchdog_timeout=600, dp_size=1, load_balance_method='round_robin', dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_flashinfer=False, disable_flashinfer_sampling=False, disable_radix_cache=False, disable_regex_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_penalizer=False, disable_nan_detection=False, enable_overlap_schedule=False, enable_mixed_chunk=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, torchao_config='', enable_p2p_check=False, triton_attention_reduce_in_fp32=False, num_continuous_decode_steps=1)
[2024-11-10 22:07:18 TP0] Automatically turn off --chunked-prefill-size and adjust --mem-fraction-static for multimodal models.
[2024-11-10 22:07:18 TP0] Init torch distributed begin.
[2024-11-10 22:07:18 TP0] Load weight begin. avail mem=38.97 GB
[2024-11-10 22:07:20 TP0] lm_eval is not installed, GPTQ may not be usable
[2024-11-10 22:07:20 TP0] Ignore import error when loading sglang.srt.models.llava. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
/root/anaconda3/envs/llava/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
[2024-11-10 22:07:20 TP0] Ignore import error when loading sglang.srt.models.llavavid. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
/root/anaconda3/envs/llava/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
[2024-11-10 22:07:20 TP0] Ignore import error when loading sglang.srt.models.yivl. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
/root/anaconda3/envs/llava/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
[2024-11-10 22:07:20 TP0] Traceback (most recent call last):
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1191, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 163, in __init__
    self.tp_worker = TpWorkerClass(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 55, in __init__
    self.model_runner = ModelRunner(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 149, in __init__
    self.load_model()
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 253, in load_model
    self.model = get_model(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
    return loader.load_model(model_config=model_config,
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 398, in load_model
    model = _initialize_model(model_config, self.load_config,
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 173, in _initialize_model
    model_class, _ = get_model_architecture(model_config)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 35, in get_model_architecture
    return ModelRegistry.resolve_model_cls(architectures)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 365, in resolve_model_cls
    model_cls = self._try_load_model_cls(arch)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 664, in load_model_cls_srt
    raise ValueError(
ValueError: Unsupported architectures: LlavaLlamaForCausalLM. Supported list: ['BaichuanForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'Grok1ForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'LlamaForCausalLM', 'Phi3ForCausalLM', 'LlamaForClassification', 'LlamaEmbeddingModel', 'MistralModel', 'LlamaForSequenceClassification', 'LlamaForSequenceClassificationWithNormal_Weights', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MllamaForConditionalGeneration', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'StableLmForCausalLM', 'TorchNativeLlamaForCausalLM', 'TorchNativePhi3ForCausalLM', 'XverseForCausalLM', 'XverseMoeForCausalLM']

Reproduction

pip3 install "sglang[all]"
pip3 install "torch>=2.1.2" "transformers>=4.36" pillow
python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-vicuna-7b --tokenizer-path llava-hf/llava-1.5-7b-hf --port 30000

Environment

Python: 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA A100-PCIE-40GB
GPU 0 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda-12.4
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.107.02
PyTorch: 2.4.0+cu124
sglang: 0.3.5
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.2
requests: 2.32.3
tqdm: 4.67.0
numpy: 1.26.4
aiohttp: 3.10.10
fastapi: 0.115.4
hf_transfer: 0.1.8
huggingface_hub: 0.26.2
interegular: 0.3.3
packaging: 24.2
PIL: 10.4.0
psutil: 6.1.0
pydantic: 2.9.2
uvicorn: 0.32.0
uvloop: 0.21.0
zmq: 26.2.0
vllm: 0.6.3.post1
multipart: 0.0.17
openai: 1.54.3
anthropic: 0.39.0
NVIDIA Topology: 
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-7     0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

ulimit soft: 1048576

sgl-project / sglang

[Bug] Run llava 1.5 backend get an error #1985

Checklist

Describe the bug

Reproduction

Environment