[Tracking] Sampler optimization

octoml / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

https://mlc.ai/mlc-llm

Apache License 2.0

5 stars 8 forks source link

[Tracking] Sampler optimization #199

Open masahi opened 9 months ago

masahi commented 9 months ago

Let's collect remaining issues we are aware of related to sampler performance

[x] Small regression (1 req / sec drop from benchmark_throughput.py) after https://github.com/octoml/mlc-llm/pull/192 when only greedy sampling is used.
[ ] Logprobs, and JSON are extremely slow

masahi commented 9 months ago

The first issue seems to have been fixed by @vvchernov https://github.com/octoml/mlc-llm/pull/215

vvchernov commented 9 months ago

Hello @masahi! No, my fix in #215 resolved very strong (more than one order) reduction after #214. About task 1: 1. we observed reduction ~25-30% after #192 2. It was not resolved, I'm investigating the issue About task 2: I remember about logprobs, but looks like resolving of task 1 requires sampler refactor and I want to do it first (or somebody will do it)