mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.24k stars 536 forks source link

[4.1 postmortem] Unit test module for inference repo #1865

Open nvzhihanj opened 1 month ago

nvzhihanj commented 1 month ago

We propose to add some basic unit test framework (likely pytest) and tests to the inference repo. Ideally, it should test:

To get this started, we can aim at some simple tests:

@pgmpablo157321 to help implement @nv-alicheng for more suggestions.

arjunsuresh commented 1 month ago

We also need a deadline for these to be done before every submission round. May be the same as code freeze date?

Currently we do have github tests for reference benchmark runs which includes download of models and datasets and even the submission checker - it is completed for small models and we also have GPTJ and SDXL tests running on self hosted github runners. We hope to cover LLAMA2, DLRMv2 and Mixtral this month.

The main concern is with upcoming benchmarks - say if the benchmark needs multiple GPUs and Terabytes of memory like say the GNN - we do not have an infrastructure to do the runs. There unit tests for the conf files and log parsing may be the only option.

nvzhihanj commented 1 month ago

@arjunsuresh Thank you for the suggestions, I think we can split into steps and cover the most important parts first (submission checker, config files, loadgen) For the benchmarks we can implement 1 by 1 according to urgency Regarding the deadline, I agree it should align with freeze date

arjunsuresh commented 1 month ago

@nvzhihanj

Yes, we do need unit tests for submission checker. Currently we check it by running it on the previous submission repository which can catch most of the issues but not all. It is currently failing because "images" folder was removed before publication for stable diffusion but this was not updated in the submission checker.

Similar situation for loadgen - we currently have tests for it and we run it on the previous benchmarks. But if a new benchmark comes and new configs are added - they are not tested.

mrmhodak commented 1 month ago

@pgmpablo157321 to work on this