Closed rmccorm4 closed 5 months ago
This should be ~90% of the changes. Going to do some local testing and CI runs to see what falls out of it.
From what I've read of the ergonomics PR, it seems like this PR needs to be merged first. How did you want to do this? Did you want to back out the overlapping changes between the PRs or merge them and deal with testing and merge collisions in the other PR?
@fpetrini15 I ended up pulling that PR's changes into this one over the weekend, so I closed the other one and will just use this PR.
Pipelines looking good, 5/5 passes across all CLI jobs :+1:
Changelog
TRT-LLM 0.9.0 changes
Model Support:
General improvements/ergonomics:
convert_checkpoint.py
viasubprocess
to make sure weights loaded in GPU memory get cleaned up. This fixes OOM issues I was seeing locally when runningtrtllm-build
step.Misc:
Known Issues:
triton import -m {llama-3-8b,llama-3-8b-instruct} --backend tensorrtllm
seems to build engines fine, but there are issues with the corresponding 24.03 trtllm server image around loading the tokenizers. These issues are fixed with the upgrade to 24.04 trtllm image and v0.9.0:pip install sentencepiece
and upgradingtransformers
recommendations:Examples
vLLM example:
Output:
TRT-LLM example:
Output:
Tests
TRTLLM locally
vLLM locally