premAI-io / benchmarks

🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
MIT License
130 stars 5 forks source link

Nvidia TensortRT LLM Mistral, Memory support, qualitative comparision and improvements #178

Closed Anindyadeep closed 5 months ago

Anindyadeep commented 5 months ago

This PR introduces all the changes by PR https://github.com/premAI-io/benchmarks/pull/167 and integrates those in Nvidia TensortRT LLM. Nvidia TensortRT LLM README now has quality checks table for both Llama 2 Chat and Mistral Instruct.

Note: This PR does not support float32 precision. Check this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/1485

If we are short in time, we can simply ignore float32 for now, since using float32 memory wise is less significant in trt llm.