huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
176 stars 53 forks source link

Add support for TGI truncate parameter #647

Closed dacorvo closed 2 days ago

dacorvo commented 3 days ago

What does this PR do?

This adds support for the truncate parameter in TGI requests, which only keeps the right-hand truncate tokens of the input prompt.