triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'name' not found #467

Open Godlovecui opened 1 month ago

Godlovecui commented 1 month ago

System Info

8*RTX4090, 24G tensorrt_llm version: 0.11.0.dev2024051400

Who can help?

@T

Information

Tasks

Reproduction

export HF_LLAMA_MODEL=/network/model/Meta-Llama-3-8B export ENGINE_PATH=/network/engine/engine_outputs_llama3_8B python3 tools/fill_template.py -i llama_ifb/preprocessing/config.pbtxt tokenizer_dir:${HF_LLAMA_MODEL},triton_max_batch_size:64,preprocessing_instance_count:1 python3 tools/fill_template.py -i llama_ifb/postprocessing/config.pbtxt tokenizer_dir:${HF_LLAMA_MODEL},triton_max_batch_size:64,postprocessing_instance_count:1 python3 tools/fill_template.py -i llama_ifb/tensorrt_llm_bls/config.pbtxt triton_max_batch_size:64,decoupled_mode:False,bls_instance_count:1,accumulate_tokens:False python3 tools/fill_template.py -i llama_ifb/ensemble/config.pbtxt triton_max_batch_size:64 python3 tools/fill_template.py -i llama_ifb/tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,triton_max_batch_size:64,decoupled_mode:False,max_beam_width:1,engine_dir:${ENGINE_PATH},max_tokens_in_paged_kv_cache:2560,max_attention_window_size:2560,kv_cache_free_gpu_mem_fraction:0.5,exclude_input_in_output:True,enable_kv_cache_reuse:False,batching_strategy:inflight_fused_batching,max_queue_delay_microseconds:0

pip install SentencePiece python3 scripts/launch_triton_server.py --world_size 8 --model_repo=llama_ifb/

Expected behavior

The triton server can be run correctly!

actual behavior

When I deploy llama3 (Meta-Llama-3-8B) in 8*RTX4090, it raises below error:

image how to fix it? thanks~

additional notes

NO

Godlovecui commented 1 month ago

When I add fake key of "name" in config.json, it raises below error: image

zheyang0825 commented 1 month ago

I meet the same problem also.

blacker521 commented 1 month ago

I meet the same problem also.

byshiue commented 3 weeks ago

Do you encounter similar issue on LLaMA2?