Open torvalds-dev opened 1 year ago
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
The issue is a feature request for the integration of Nvidia Triton TensorRT LLMs (Language Model) into the llama index. The user wants to add support for Nvidia Triton TensorRT LLMs in llama index. Currently, llama index supports several other LLM endpoints, and the user believes that Nvidia's Triton LLM endpoints would be a valuable addition.
The reason for this request is that there are already several implementations of this feature available on the internet. However, end users must manually assemble the solution and build it from the source. The proposed feature would eliminate the need for these efforts.
The value of this feature is that it would provide new LLM endpoints, giving users more options and control over how they can use llama index. It could also potentially allow users to keep their data under their own control.
Based on the provided code, the following files seem to be relevant to the issue:
llama_index/llm_predictor/__init__.py
: This file initializes the LLM predictor. It might need to be updated to include the new Nvidia Triton TensorRT LLM.
llama_index/llms/llama_cpp.py
: This file contains the implementation of the LlamaCPP class, which is a custom LLM. It might serve as a reference for implementing the Nvidia Triton TensorRT LLM.
llama_index/llms/__init__.py
: This file initializes the llms module and might need to be updated to include the new Nvidia Triton TensorRT LLM.
llama_index/langchain_helpers/agents/__init__.py
: This file initializes the agents module in the langchain_helpers package. It might need to be updated to include the new Nvidia Triton TensorRT LLM.
The following actions should be taken:
llama_index/llm_predictor/__init__.py
and llama_index/llms/__init__.py
files to include the new Nvidia Triton TensorRT LLM.
Feature Description
I would like to add support for Nvidia Triton TensorRT LLMs in llama index. There is currently support for several other LLM endpoints and Nvidia has several interesting offerings with their Triton LLM endpoints that I think others would find useful in llama_index.
Reason
There are several implementations of this floating around the internet already. However, end users must "hack" together the solution and build from source. This will allow users to not have to go through those efforts.
Value of Feature
New LLM endpoints that give users more options and control over how they can use llama index and potential keep their data under their own control.