Closed KerfuffleV2 closed 7 months ago
This shouldn't be merged until I release 0.0.7 (likely in the next couple days), but I think it's ready for review in case there are any changes need. After the release, I'll update Cargo.toml
to use that instead of pointing at the repo.
I added a new way to build the logits that prunes them (like Top-K) and they start out sorted which can be a big performance win. For example, doing logits::try_from_iter_top_k(blah, 1000)
only takes the top 1,000. The remainder are not likely to ever be selected by sampling. Let me know if you want me to add a commandline option or something to enable that. There's some discussion here: https://github.com/KerfuffleV2/llm-samplers/pull/9#issuecomment-1795639520
@philpax Could you please take a look at this one? (I don't seem to have access to request a review.)
Thanks for checking. Should be all set to merge as far as I know as long as you're satisfied with the changes. Note that it passes the various tests but actual usage wasn't extensively tested, so I'd recommend running it on a model and making sure you get reasonable results. I don't have a lot of old GGML format models laying around.
Tested and it seems to work. Thanks for your work!
Not a problem. If you ever find the sampling performance to have a measurable impact, you can try the Logits::try_from_iter_top_k
method. If you set k
to 2000 or so it's extremely unlikely to affect results and can increase the performance a lot, especially for those models with a very large vocab size (there are a few with 250K+).
See https://github.com/KerfuffleV2/llm-samplers/pull/9 for more information about the changes.
Notable features are adding Top-A and Min-P samplers.