neuralmagic / nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251 stars 10 forks source link

Benchmarking separation #362

Closed dbarbuzzi closed 3 months ago

dbarbuzzi commented 4 months ago

This PR sets the groundwork for separating the "serving" and "throughput" benchmarks into separate UI pages/etc. Their data will persist in subfolders of the existing dev/bench folder of the nm-gh-pages branch, and they will have their own separate UI pages. We can easily put a simple index.html page in dev/bench which has links to these separate pages.

With these changes, the currently executed benchmark_serving results will be present at the serving subfolder and the upcoming benchmark_thoughput results will be in a throughput subfolder:

One thing I’d like improved is how the separate files are handled in the BENCHMARK-RESULT job in .github/workflows/nm-benchmark.yml. Since you cannot use a matrix strategy within a step, I opted in the short term for duplicating the steps so that, similar to the existing process, each potential results file will have its own step guarded by the if prop. I could likely make the entire job use a matrix strategy, however, I’d be concerned about the potential of merge conflicts/etc. arising if multiple jobs are trying to push to the nm-gh-pages branch too close to each other.

Additionally: