Add benchmarking of token generation speed. To do so, we generate a fixed number of tokens (using min_new_tokens=max_new_tokens=20), then compute number of generated tokens per sec for the given batch size with dummy batch, repeat 100 times and average.
Add benchmarking of token generation speed. To do so, we generate a fixed number of tokens (using
min_new_tokens=max_new_tokens=20
), then compute number of generated tokens per sec for the given batch size with dummy batch, repeat 100 times and average.