expand benchmark to h100's

neuralmagic / nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://nm-vllm.readthedocs.io

Other

251 stars 10 forks source link

Closed andy-neuma closed 4 months ago

andy-neuma commented 4 months ago

SUMMARY:

updated "build test" to accept an array of benchmarking labels
updated "remote push" and "nightly" workflows to include benchmarking on h100's
adjusted docker job to have same criteria as upload job. did this since upload could fail, but for auth reasons and this shouldn't stop us from push docker.

TEST PLAN: runs on remote push