Closed mikowals closed 12 months ago
Look cool. I'll check this out a bit later .
Did you notice any tok/s performance improvement ?
The tok/s improvement was negligible for the size of prompts I was testing on. For longer prompts it probably is noticeable but would occur as the prompt be read a couple seconds faster and the change then being averaged over how many tokens the model went on to produce. So I focused on directly comparing the code being replaced.
merged! @mikowals thanks for the clean implementation!
Locally I confirmed this out put the same prompt_tokens as str_lookup.
In benchmarking in a Github codespace with 8 cores and 32 GB ram sorting took 6ms and getting prompt tokens of a few sentence prompt took 1ms. With old str_lookup getting the prompt tokens for the same prompt to 230ms.
The code can be cleaned up a lot when it becomes easier to have a pointer hold a locally defined struct with both the string and index.