Open masahi opened 9 months ago
The first issue seems to have been fixed by @vvchernov https://github.com/octoml/mlc-llm/pull/215
Hello @masahi! No, my fix in #215 resolved very strong (more than one order) reduction after #214. About task 1: 1. we observed reduction ~25-30% after #192 2. It was not resolved, I'm investigating the issue About task 2: I remember about logprobs, but looks like resolving of task 1 requires sampler refactor and I want to do it first (or somebody will do it)
Let's collect remaining issues we are aware of related to sampler performance
benchmark_throughput.py
) after https://github.com/octoml/mlc-llm/pull/192 when only greedy sampling is used.