[Bug] Phi3-v model compile bug

🐛 Bug

To Reproduce

Using this model Phi-3-vision-128k-instruct I got some bugs, need your help !!!

For phi3-v problem, when I converted model weight, I got

Error message

  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/interface/convert_weight.py", line 181, in convert_weight
    _convert_args(args)
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/interface/convert_weight.py", line 68, in _convert_args
    model, quantize_map = args.model.quantize[args.quantization.kind](
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/model/phi3v/phi3v_quantization.py", line 19, in group_quant
    model: nn.Module = Phi3VForCausalLM(model_config)
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/model/phi3v/phi3v_model.py", line 129, in __init__
    self.model = Phi3Model(config)
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/model/phi3/phi3_model.py", line 206, in __init__
    self.h = nn.ModuleList([Phi3ParallelBlock(config) for _ in range(config.num_hidden_layers)])
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/model/phi3/phi3_model.py", line 206, in <listcomp>
    self.h = nn.ModuleList([Phi3ParallelBlock(config) for _ in range(config.num_hidden_layers)])
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/model/phi3/phi3_model.py", line 165, in __init__
    self.mixer = PhiMHA(config)
  File "/opt/conda/lib/python3.10/site-packages/mlc_llm/model/phi3/phi3_model.py", line 128, in __init__
    config.rope_scaling["long_factor"] if config.rope_scaling is not None else None
AttributeError: 'Phi3VConfig' object has no attribute 'rope_scaling'

It seems like this part problem.

self.rope_ext_factors = (
            config.rope_scaling["long_factor"] if config.rope_scaling is not None else None
)

Also I want to quantize to q4f32_1, but q4f16_1 works for me, got error like that:

error message

ValueError: Traceback (most recent call last):
  11: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:182
  10: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:650
  9: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:113
  8: mlc::llm::serve::TokenDataNode::GetEmbedding(mlc::llm::serve::Model, tvm::runtime::ObjectRef*, int) const
        at /workspace/mlc-llm/cpp/serve/data.cc:66
  7: mlc::llm::serve::ModelImpl::TokenEmbed(tvm::runtime::ShapeTuple, tvm::runtime::ObjectRef*, int)
        at /workspace/mlc-llm/cpp/serve/model.cc:102
  6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  1: tvm::runtime::relax_vm::CheckTensorInfo(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/relax_vm/builtin.cc", line 247
ValueError: Check failed: (DataType(ptr->dl_tensor.dtype) == dtype) is false: ErrorContext(fn=embed, loc=param[1], param=packed_params, annotation=R.Tuple(R.Tensor((32064, 384), dtype="uint32"), R.Tensor((32064, 96), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((9216, 384), dtype="uint32"), R.Tensor((9216, 96), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((16384, 384), dtype="uint32"), R.Tensor((16384, 96), dtype="float16"), R.Tensor((3072, 1024), dtype="uint32"), R.Tensor((3072, 256), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((vocab_size, 384), dtype="uint32"), R.Tensor((vocab_size, 96), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 3, 14, 14), dtype="float16"), R.Tensor((577, 128), dtype="uint32"), R.Tensor((577, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024, 128), dtype="uint32"), R.Tensor((1024, 32), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((4096, 128), dtype="uint32"), R.Tensor((4096, 32), dtype="float16"), R.Tensor((4096,), dtype="float16"), R.Tensor((1024, 512), dtype="uint32"), R.Tensor((1024, 128), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1024,), dtype="float16"), R.Tensor((1, 1, 4096), dtype="float16"), R.Tensor((1, 1, 1, 4096), dtype="float16"), R.Tensor((3072, 512), dtype="uint32"), R.Tensor((3072, 128), dtype="float16"), R.Tensor((3072,), dtype="float16"), R.Tensor((3072, 384), dtype="uint32"), R.Tensor((3072, 96), dtype="float16"), R.Tensor((3072,), dtype="float16")))  expect Tensor with dtype float16 but get float32

And I set self.rope_ext_factors = None, then I ran model on local, I sent a message with image to server, also got error

input message

{
        "type": "text",
        "text": "Please descibe this image",
},
{
        "type": "image_url",
        "image_url": f"data:image/jpeg;base64,{image_base64}",
}

Error message

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/opt/conda/lib/python3.10/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 182, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 650, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 119, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/data.cc", line 96, in mlc::llm::serve::ImageDataNode::GetEmbedding(mlc::llm::serve::Model, tvm::runtime::ObjectRef*, int) const
  File "/workspace/mlc-llm/cpp/serve/model.cc", line 117, in mlc::llm::serve::ModelImpl::ImageEmbed(tvm::runtime::NDArray const&, tvm::runtime::ObjectRef*, int)
ValueError: Traceback (most recent call last):
  11: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:182
  10: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:650
  9: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:119
  8: mlc::llm::serve::ImageDataNode::GetEmbedding(mlc::llm::serve::Model, tvm::runtime::ObjectRef*, int) const
        at /workspace/mlc-llm/cpp/serve/data.cc:96
  7: mlc::llm::serve::ModelImpl::ImageEmbed(tvm::runtime::NDArray const&, tvm::runtime::ObjectRef*, int)
        at /workspace/mlc-llm/cpp/serve/model.cc:117
  6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  1: tvm::runtime::relax_vm::CheckTensorInfo(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/relax_vm/builtin.cc", line 241
ValueError: Check failed: (ptr->dl_tensor.ndim == ndim) is false: ErrorContext(fn=image_embed, loc=param[0], param=pixel_values, annotation=R.Tensor((1, 17, 3, 336, 336), dtype="float32"))  expect Tensor with ndim 5 but get 4

It seems like this part problem

if config["model_config"]["model_type"] == "phi3_v":
                        message_list.append(data.ImageData.phi3v_from_url(image_url, config))
                    else:
                        message_list.append(data.ImageData.from_url(image_url, config))

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): ubuntu22.04
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): AWS g4dn.xlarge
How you installed MLC-LLM (conda, source):python pre-built package
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:

mlc-ai / mlc-llm

[Bug] Phi3-v model compile bug #2858

🐛 Bug

To Reproduce

Expected behavior

Environment