run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.84k stars 5.08k forks source link

[Bug]: Not able to import Llama index modules in tensor rt llms docker image. #14371

Closed rvssridatta closed 1 day ago

rvssridatta commented 3 months ago

Bug Description

Even though I am following the latest documentation from llama index . I am still not able to import llama index. Issue 1 Llama index documentation link:https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt/ versions: ubuntu:22.04 cuda : 12.2 architecture --> Arm 64 jetpack - 6.0

Issue 2 Nvidia jetson container link:https://github.com/dusty-nv/jetson-containers?tab=readme-ov-file versions: ubuntu:22.04 cuda : 12.2 architecture --> Arm 64 jetpack - 6.0

please provide standard solution to deploy tensor rt llm integrated with some llama index rag modules. Device used : Advantech Jetson Orin NX - 16 GB variant

1 photo added

Version

llama-index 0.10.50

Steps to Reproduce

Issue 1: 1) follow llama index documentation, step by step provided commands 2) got error at 3rd command from documentation "pip3 install tensorrt_llm==0.8.0 -U --extra-index-url https://pypi.nvidia.com"

Issue 2: 1) followed dusty nv documenation, steb by step provided commands. 2) got same error mentioned in relevant logs, "$ jetson-containers run $(autotag tensorrt-llm)

Relevant Logs/Tracbacks

Error:
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting tensorrt_llm==0.8.0
  Downloading tensorrt-llm-0.8.0.tar.gz (6.9 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-r7zpl9ve/tensorrt-llm_382951b6d5f34b8798d95f1967eb0620/setup.py", line 90, in <module>
          raise RuntimeError("Bad params")
      RuntimeError: Bad params
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
dosubot[bot] commented 3 months ago

To address the issues you're encountering with importing Llama Index modules in the TensorRT LLM Docker image on an Advantech Jetson Orin NX - 16 GB variant device, follow these steps:

  1. Ensure CUDA 12.2 or Higher is Installed: Verify that your device has Nvidia CUDA 12.2 or higher installed, as it is required to run TensorRT-LLM.

  2. Install tensorrt_llm: Use the following pip command to install the tensorrt_llm package:

    pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
  3. Prepare Llama2 Model Files: Follow the instructions provided in the TensorRT-LLM GitHub repository to create the necessary model files. The following files should be generated:

    • Llama_float16_tp1_rank0.engine
    • config.json
    • model.cache
  4. Organize Model Files: Create a directory named model and move the generated files into this directory:

    mkdir model
    mv Llama_float16_tp1_rank0.engine config.json model.cache model/
  5. Install Llama Index Integration: Install the Llama Index integration for NVIDIA TensorRT:

    pip install llama-index-llms-nvidia-tensorrt
  6. Run the Example Code: Use the following Python code to initialize and run the LocalTensorRTLLM:

    from llama_index.llms.nvidia_tensorrt import LocalTensorRTLLM
    
    def completion_to_prompt(completion):
       return f"<s> [INST] {completion} [/INST] "
    
    def messages_to_prompt(messages):
       content = ""
       for message in messages:
           content += str(message) + "\n"
       return f"<s> [INST] {content} [/INST] "
    
    llm = LocalTensorRTLLM(
       model_path="./model",
       engine_name="llama_float16_tp1_rank0.engine",
       tokenizer_dir="meta-llama/Llama-2-13b-chat",
       completion_to_prompt=completion_to_prompt,
       messages_to_prompt=messages_to_prompt,
    )
    
    resp = llm.complete("Who is Paul Graham?")
    print(str(resp))

Troubleshooting RuntimeError: Bad params

If you encounter the RuntimeError: Bad params error during the installation of tensorrt_llm, follow these steps:

  1. Ensure Correct Installation: Reinstall tensorrt_llm:

    pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
  2. Check CUDA Version: Ensure that you have Nvidia CUDA 12.2 or higher installed.

  3. Verify Model Path: Ensure that the model_path provided exists and contains the necessary files (Llama_float16_tp1_rank0.engine, config.json, model.cache).

  4. Check Configuration File: Ensure that the config.json file in the model directory is correctly formatted and contains all necessary configurations.

  5. Validate Tensor Parallel and Pipeline Parallel Sizes: Ensure that the tensor_parallel and pipeline_parallel sizes in the config.json match the runtime sizes.

  6. Check GPU Availability: Ensure that a CUDA-enabled GPU is available and set the correct device.

  7. Verify Tokenizer Directory: Ensure that the tokenizer_dir provided is correct and contains the necessary tokenizer files.

  8. Check Sampling Configuration: Ensure that the sampling configuration parameters are correctly set.

Enable Debug Mode

To get more insights into the RuntimeError: Bad params error, enable the debug_mode when creating the GenerationSession object:

decoder = tensorrt_llm.runtime.GenerationSession(
    self._model_config, engine_buffer, runtime_mapping, debug_mode=True
)

By following these steps, you should be able to deploy TensorRT LLM integrated with Llama Index RAG modules on your Advantech Jetson Orin NX - 16 GB variant device and troubleshoot any issues that arise [1][2].

To continue talking to Dosu, mention @dosu.