There is an error in one of the generation tests that was revealed only by using a model that already uses multiple eos token ids. This fixes the test, and also switches to a smaller model that is already cached for faster CI.
Unrelated, this pull-request also reduces the verbosity on TGI startup.
What does this PR do?
There is an error in one of the generation tests that was revealed only by using a model that already uses multiple eos token ids. This fixes the test, and also switches to a smaller model that is already cached for faster CI.
Unrelated, this pull-request also reduces the verbosity on TGI startup.