Closed 81549361 closed 2 months ago
Yes, the corresponding kernel has already been implemented in FlashInfer. I think it shouldn't be too difficult to integrate into SGLang. Are you interested in submitting a PR? We highly welcome contributions!
是的,相应的内核已经在 FlashInfer 中实现了。我认为集成到 SGLang 中应该不会太难。你有兴趣提交 PR 吗?我们非常欢迎贡献!
I'd love to help but I'm a newbie and I only know how to add min_p sampling but don't know how to use it with top k and top p at the same time. https://github.com/81549361/sglang/commit/79e8e8d7ee003a5e8bf8089c1414a9d36aa176d9
closed by #1167
Motivation
Motivation The min_p sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and users in the LocalLlama community) it tends to perform better than the usual top_p+top_k approach. You can see the readmes of HF repositories of many new model finetunes/merges recommend to use min_p instead of top_p and top_k.
Some of the code has been implemented in flashinfer. https://github.com/flashinfer-ai/flashinfer/pull/422
Related resources vLLM: https://github.com/vllm-project/vllm/blob/8ea5e44a435e8731fd6f5ba4c329dd112752532a/vllm/sampling_params.py#L64C9-L66C57 min_p: Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
So e.g. a min_p of 0.07 means that if a token that is less than 7% of the probability of the highest-probability token, it will be disqualified. A min_p of 0.5 would mean that if a token is not at least half the probability of the highest-probability token, then it is disqualified. Said another way, min_p allows you to set a minimum fraction of the most likely token's probability, else the token cannot be sampled.
https://github.com/vllm-project/vllm/pull/1642 https://github.com/oobabooga/text-generation-webui/pull/4449 https://github.com/ggerganov/llama.cpp/pull/3841 Please see the above links for more info.
Related resources
No response