[Feature Request]: Nvidia Triton Tensor RT LLM Integrations

torvalds-dev commented 1 year ago

Feature Description

I would like to add support for Nvidia Triton TensorRT LLMs in llama index. There is currently support for several other LLM endpoints and Nvidia has several interesting offerings with their Triton LLM endpoints that I think others would find useful in llama_index.

Reason

There are several implementations of this floating around the internet already. However, end users must "hack" together the solution and build from source. This will allow users to not have to go through those efforts.

Value of Feature

New LLM endpoints that give users more options and control over how they can use llama index and potential keep their data under their own control.

torvalds-dev commented 1 year ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 1 year ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 1 year ago

The issue is a feature request for the integration of Nvidia Triton TensorRT LLMs (Language Model) into the llama index. The user wants to add support for Nvidia Triton TensorRT LLMs in llama index. Currently, llama index supports several other LLM endpoints, and the user believes that Nvidia's Triton LLM endpoints would be a valuable addition.

The reason for this request is that there are already several implementations of this feature available on the internet. However, end users must manually assemble the solution and build it from the source. The proposed feature would eliminate the need for these efforts.

The value of this feature is that it would provide new LLM endpoints, giving users more options and control over how they can use llama index. It could also potentially allow users to keep their data under their own control.

Based on the provided code, the following files seem to be relevant to the issue:

llama_index/llm_predictor/__init__.py: This file initializes the LLM predictor. It might need to be updated to include the new Nvidia Triton TensorRT LLM.
llama_index/llms/llama_cpp.py: This file contains the implementation of the LlamaCPP class, which is a custom LLM. It might serve as a reference for implementing the Nvidia Triton TensorRT LLM.
llama_index/llms/__init__.py: This file initializes the llms module and might need to be updated to include the new Nvidia Triton TensorRT LLM.
llama_index/langchain_helpers/agents/__init__.py: This file initializes the agents module in the langchain_helpers package. It might need to be updated to include the new Nvidia Triton TensorRT LLM.

The following actions should be taken:

Understand the Nvidia Triton TensorRT LLM and how it works.
Design an interface for the Nvidia Triton TensorRT LLM that is consistent with the existing LLMs in llama index.
Implement the Nvidia Triton TensorRT LLM in a new Python class.
Update the llama_index/llm_predictor/__init__.py and llama_index/llms/__init__.py files to include the new Nvidia Triton TensorRT LLM.
Test the new implementation with various use cases to ensure it works as expected.
Document the new feature and how to use it.

torvalds-dev / llama_index