[Feature Request]: Nvidia Triton Tensor RT LLM Integrations

torvalds-dev commented 12 months ago

Feature Description

I would like to add support for Nvidia Triton TensorRT LLMs in llama index. There is currently support for several other LLM endpoints and Nvidia has several interesting offerings with their Triton LLM endpoints that I think others would find useful in llama_index.

Reason

There are several implementations of this floating around the internet already. However, end users must "hack" together the solution and build from source. This will allow users to not have to go through those efforts.

Value of Feature

New LLM endpoints that give users more options and control over how they can use llama index and potential keep their data under their own control.

torvalds-dev commented 12 months ago

torvalds.dev is analyzing the ticket

torvalds-dev[bot] commented 12 months ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 12 months ago

The issue is a feature request for the integration of Nvidia Triton TensorRT LLMs (Language Model) into the llama index. The llama index currently supports several other LLM endpoints, and the user believes that Nvidia's Triton LLM endpoints would be a valuable addition.

The reason for this request is that there are already several implementations of this feature available on the internet. However, these require end users to manually piece together the solution and build it from the source. The proposed feature would eliminate this need, making it easier for users to utilize the llama index.

The value of this feature lies in the new LLM endpoints it would provide. These would give users more options and control over how they use the llama index, potentially allowing them to keep their data under their own control.

To address this issue, the following actions could be taken:

Research Nvidia Triton TensorRT LLMs and understand how they can be integrated into the llama index.
Identify the existing implementations of this feature on the internet and understand how they work.
Develop a plan for integrating Nvidia Triton TensorRT LLMs into the llama index, taking into account the need to make the process easy for end users.
Implement the plan, ensuring that the new feature works as expected and does not introduce any new issues.

The relevant files for this issue are:

llama_index/llms/llama_cpp.py: This file contains the implementation of the LlamaCPP class, which is a custom LLM. It seems to be the main file where the integration of Nvidia Triton TensorRT LLMs would take place.
llama_index/langchain_helpers/agents/__init__.py: This file contains the initialization of the llama integration with Langchain agents. It might be relevant if the integration of Nvidia Triton TensorRT LLMs involves changes to the way llama interacts with Langchain agents.

torvalds-dev / llama_index