huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Apache License 2.0

195 stars 59 forks source link

Fix llama test #600

Closed dacorvo closed 4 months ago

dacorvo commented 4 months ago

What does this PR do?

There is an error in one of the generation tests that was revealed only by using a model that already uses multiple eos token ids. This fixes the test, and also switches to a smaller model that is already cached for faster CI.

Unrelated, this pull-request also reduces the verbosity on TGI startup.