Closed rvssridatta closed 1 day ago
To address the issues you're encountering with importing Llama Index modules in the TensorRT LLM Docker image on an Advantech Jetson Orin NX - 16 GB variant device, follow these steps:
Ensure CUDA 12.2 or Higher is Installed: Verify that your device has Nvidia CUDA 12.2 or higher installed, as it is required to run TensorRT-LLM.
Install tensorrt_llm
:
Use the following pip command to install the tensorrt_llm
package:
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
Prepare Llama2 Model Files: Follow the instructions provided in the TensorRT-LLM GitHub repository to create the necessary model files. The following files should be generated:
Llama_float16_tp1_rank0.engine
config.json
model.cache
Organize Model Files:
Create a directory named model
and move the generated files into this directory:
mkdir model
mv Llama_float16_tp1_rank0.engine config.json model.cache model/
Install Llama Index Integration: Install the Llama Index integration for NVIDIA TensorRT:
pip install llama-index-llms-nvidia-tensorrt
Run the Example Code: Use the following Python code to initialize and run the LocalTensorRTLLM:
from llama_index.llms.nvidia_tensorrt import LocalTensorRTLLM
def completion_to_prompt(completion):
return f"<s> [INST] {completion} [/INST] "
def messages_to_prompt(messages):
content = ""
for message in messages:
content += str(message) + "\n"
return f"<s> [INST] {content} [/INST] "
llm = LocalTensorRTLLM(
model_path="./model",
engine_name="llama_float16_tp1_rank0.engine",
tokenizer_dir="meta-llama/Llama-2-13b-chat",
completion_to_prompt=completion_to_prompt,
messages_to_prompt=messages_to_prompt,
)
resp = llm.complete("Who is Paul Graham?")
print(str(resp))
RuntimeError: Bad params
If you encounter the RuntimeError: Bad params
error during the installation of tensorrt_llm
, follow these steps:
Ensure Correct Installation:
Reinstall tensorrt_llm
:
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
Check CUDA Version: Ensure that you have Nvidia CUDA 12.2 or higher installed.
Verify Model Path:
Ensure that the model_path
provided exists and contains the necessary files (Llama_float16_tp1_rank0.engine
, config.json
, model.cache
).
Check Configuration File:
Ensure that the config.json
file in the model directory is correctly formatted and contains all necessary configurations.
Validate Tensor Parallel and Pipeline Parallel Sizes:
Ensure that the tensor_parallel
and pipeline_parallel
sizes in the config.json
match the runtime sizes.
Check GPU Availability: Ensure that a CUDA-enabled GPU is available and set the correct device.
Verify Tokenizer Directory:
Ensure that the tokenizer_dir
provided is correct and contains the necessary tokenizer files.
Check Sampling Configuration: Ensure that the sampling configuration parameters are correctly set.
To get more insights into the RuntimeError: Bad params
error, enable the debug_mode
when creating the GenerationSession
object:
decoder = tensorrt_llm.runtime.GenerationSession(
self._model_config, engine_buffer, runtime_mapping, debug_mode=True
)
By following these steps, you should be able to deploy TensorRT LLM integrated with Llama Index RAG modules on your Advantech Jetson Orin NX - 16 GB variant device and troubleshoot any issues that arise [1][2].
To continue talking to Dosu, mention @dosu.
Bug Description
Even though I am following the latest documentation from llama index . I am still not able to import llama index. Issue 1 Llama index documentation link:https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt/ versions: ubuntu:22.04 cuda : 12.2 architecture --> Arm 64 jetpack - 6.0
Issue 2 Nvidia jetson container link:https://github.com/dusty-nv/jetson-containers?tab=readme-ov-file versions: ubuntu:22.04 cuda : 12.2 architecture --> Arm 64 jetpack - 6.0
please provide standard solution to deploy tensor rt llm integrated with some llama index rag modules. Device used : Advantech Jetson Orin NX - 16 GB variant
Version
llama-index 0.10.50
Steps to Reproduce
Issue 1: 1) follow llama index documentation, step by step provided commands 2) got error at 3rd command from documentation "pip3 install tensorrt_llm==0.8.0 -U --extra-index-url https://pypi.nvidia.com"
Issue 2: 1) followed dusty nv documenation, steb by step provided commands. 2) got same error mentioned in relevant logs, "$ jetson-containers run $(autotag tensorrt-llm)
Relevant Logs/Tracbacks