Closed arnavgarg1 closed 9 months ago
6 files ±0 6 suites ±0 14m 15s :stopwatch: -1s 12 tests ±0 9 :heavy_check_mark: ±0 3 :zzz: ±0 0 :x: ±0 60 runs ±0 42 :heavy_check_mark: ±0 18 :zzz: ±0 0 :x: ±0
Results for commit 218f58b6. ± Comparison against base commit 9bb89c6c.
:recycle: This comment has been updated with latest results.
Implements support for Prompt Lookup Decoding by exposing a new generation config parameter called
prompt_lookup_num_tokens
. Compatible with transformer version >= 4.37.0.In scenarios where the prompt is long and the output generated might re-use a lot of common ngrams, this can speedup token generation by near 2x - 2.4x. However, in scenarios where that may not be the case, such as open-ended questions, it leads to a 10% decrease in tokens per second.
Demo
https://drive.google.com/file/d/1E8qq8HnJBhL7GOuFDuMdih_GY1aAVwEC/view?usp=sharing
Script to Reproduce Demo