Closed Anindyadeep closed 5 months ago
This PR introduces all the changes by PR https://github.com/premAI-io/benchmarks/pull/167 and integrates those in Nvidia TensortRT LLM. Nvidia TensortRT LLM README now has quality checks table for both Llama 2 Chat and Mistral Instruct.
Note: This PR does not support float32 precision. Check this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/1485
float32
If we are short in time, we can simply ignore float32 for now, since using float32 memory wise is less significant in trt llm.
This PR introduces all the changes by PR https://github.com/premAI-io/benchmarks/pull/167 and integrates those in Nvidia TensortRT LLM. Nvidia TensortRT LLM README now has quality checks table for both Llama 2 Chat and Mistral Instruct.
Note: This PR does not support
float32
precision. Check this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/1485If we are short in time, we can simply ignore float32 for now, since using float32 memory wise is less significant in trt llm.