vossr / Chat-With-RTX-python-api

Chat With RTX Python API
58 stars 10 forks source link
chat-with-rtx llama2-13b llm llm-inference mistral-7b nvidia-chat-with-rtx tensorrt tensorrt-llm

Python API for Chat With RTX

Usage

.\start_server.bat

import rtx_api_july_2024 as rtx_api

response = rtx_api.send_message("write fire emoji")
print(response)

Speed

Chat With RTX builds int4 (W4A16 AWQ) tensortRT engines for LLMs

Model On 4090
Mistral 457 char/sec
Llama2 315 char/sec
ChatGLM3 385 char/sec
Gemma 407 char/sec




Update History of Chat With RTX
3.2024  Removed youtube video transcript fetch
4.2024  Added Whisper Speech to text model
7.2024  Electron app ui

LICENSE: CC0