Closed IzzyPutterman closed 3 months ago
CI ref: 15744750
Good question. It won't affect the statistics because this is not part of the statistics and we are keeping the output token counts only for the visualization purposes and support our token position vs ITL plot. This is because with our updated ITL metric, there's no longer token-level inter token latencies.
Good question. It won't affect the statistics because this is not part of the statistics and we are keeping the output token counts only for the visualization purposes and support our token position vs ITL plot. This is because with our updated ITL metric, there's no longer token-level inter token latencies.
Got it, thanks for explaining! Would it be possible to add tests where the sum of the token counts is different than the token count of the full text output? Or is that no longer necessary?
@dyastremsky Added checking if the sum equals to total token count. Individual token counts are not part of statistics but I think it will never hurt to add more tests :) Thanks for the feedback 👍
Calculating total tokens from the sum of the chunks can result in numbers that are off (10% in the case of llama3). This is due to the fact that our old WAR to use the "!" + text, actually results in a single token instead of 2 tokens somethings. For example "!." is just 1 token instead of 2.
Given that the ITL has been changed to be on the request level, we should change token count to be more accurate with this flexibility.