Benchmarking separation

This PR sets the groundwork for separating the "serving" and "throughput" benchmarks into separate UI pages/etc. Their data will persist in subfolders of the existing dev/bench folder of the nm-gh-pages branch, and they will have their own separate UI pages. We can easily put a simple index.html page in dev/bench which has links to these separate pages.

With these changes, the currently executed benchmark_serving results will be present at the serving subfolder and the upcoming benchmark_thoughput results will be in a throughput subfolder:

benchmark_serving: https://neuralmagic.github.io/nm-vllm/dev/bench/serving/
benchmark_throughput: https://neuralmagic.github.io/nm-vllm/dev/bench/throughput/

One thing I’d like improved is how the separate files are handled in the BENCHMARK-RESULT job in .github/workflows/nm-benchmark.yml. Since you cannot use a matrix strategy within a step, I opted in the short term for duplicating the steps so that, similar to the existing process, each potential results file will have its own step guarded by the if prop. I could likely make the entire job use a matrix strategy, however, I’d be concerned about the potential of merge conflicts/etc. arising if multiple jobs are trying to push to the nm-gh-pages branch too close to each other.

Additionally:

I inadvertently included a change where I started to use GitHub Action’s output grouping, which I think is generally an improvement, but that can be easily removed. In this isolated usage, it groups all the pip install output during the run benchmarks action into a collapsed-by-default group (screenshot below), which can be clicked to expand and will auto-expand if you use the GitHub UI “Search logs” textbox:
I cleaned up misc “input”-related things in .github/actions/nm-github-action-benchmark/action.yml – the type prop is not valid inside an action (it’s mostly to guide the UI and typically only value for workflow_dispatch input definitions).

neuralmagic / nm-vllm

Benchmarking separation #362