rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
https://docs.rs/llm/latest/llm/
Apache License 2.0
6.06k stars 350 forks source link

Update to llm-samplers v0.0.7 #440

Closed KerfuffleV2 closed 7 months ago

KerfuffleV2 commented 8 months ago

See https://github.com/KerfuffleV2/llm-samplers/pull/9 for more information about the changes.

Notable features are adding Top-A and Min-P samplers.

KerfuffleV2 commented 8 months ago

This shouldn't be merged until I release 0.0.7 (likely in the next couple days), but I think it's ready for review in case there are any changes need. After the release, I'll update Cargo.toml to use that instead of pointing at the repo.

I added a new way to build the logits that prunes them (like Top-K) and they start out sorted which can be a big performance win. For example, doing logits::try_from_iter_top_k(blah, 1000) only takes the top 1,000. The remainder are not likely to ever be selected by sampling. Let me know if you want me to add a commandline option or something to enable that. There's some discussion here: https://github.com/KerfuffleV2/llm-samplers/pull/9#issuecomment-1795639520

KerfuffleV2 commented 7 months ago

@philpax Could you please take a look at this one? (I don't seem to have access to request a review.)

KerfuffleV2 commented 7 months ago

Thanks for checking. Should be all set to merge as far as I know as long as you're satisfied with the changes. Note that it passes the various tests but actual usage wasn't extensively tested, so I'd recommend running it on a model and making sure you get reasonable results. I don't have a lot of old GGML format models laying around.

philpax commented 7 months ago

Tested and it seems to work. Thanks for your work!

KerfuffleV2 commented 7 months ago

Not a problem. If you ever find the sampling performance to have a measurable impact, you can try the Logits::try_from_iter_top_k method. If you set k to 2000 or so it's extremely unlikely to affect results and can increase the performance a lot, especially for those models with a very large vocab size (there are a few with 250K+).