triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
551 stars 227 forks source link

Fix input token count #599

Closed tgerdesnv closed 5 months ago

tgerdesnv commented 5 months ago

Fixed synthetic prompt generation so that the number of tokens is correct Fixed and tested synthetic prompt generation stddev and randomization Tweaks to speed up unit tests.

IzzyPutterman commented 5 months ago

Random thought, do we think we should add "hi" to the start of the prompt such that the latest info in the context is what we want output? Probably shouldn't matter much, but might be worth it to check the quality of the outputs.

debermudez commented 5 months ago

Random thought, do we think we should add "hi" to the start of the prompt such that the latest info in the context is what we want output? Probably shouldn't matter much, but might be worth it to check the quality of the outputs.

Does position in the prompt effect output?

dyastremsky commented 5 months ago

Random thought, do we think we should add "hi" to the start of the prompt such that the latest info in the context is what we want output? Probably shouldn't matter much, but might be worth it to check the quality of the outputs.

I like this idea a lot.

Does position in the prompt effect output?

You can imagine the difference between me asking you "Hi hi hi, how are you?" and "How are you, hi hi hi?" The former would have you responding pretty similarly to how you would respond to "how are you?" without any "hi hi hi." This probably becomes more extreme with the longer prompts, as the "hi hi hi" would be less relevant at the start versus very relevant if it was the last few words in a long prompt.

debermudez commented 5 months ago

You can imagine the difference between me asking you "Hi hi hi, how are you?" and "How are you, hi hi hi?" The former would have you responding more to the "how are you?" than the "hi hi hi." This probably becomes more extreme with the longer prompts, as the "hi hi hi" would be less relevant at the start versus very relevant if it was the last few words in a long prompt.

I vaguely recall something around this when i learned about LSTM networks. This is good recommendation and great example to illustrate the point. Thanks!