mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) to make it easier to build and benchmark AI systems across diverse models, data sets, software and hardware
http://docs.mlcommons.org/cm4mlops/
Apache License 2.0
14 stars 20 forks source link

Error on generate submission tree for SCC24 #474

Closed yangkai2002 closed 3 weeks ago

yangkai2002 commented 3 weeks ago

Hi, we found this error when running commands in branch mlperf-inference:

First I ran a result of scc24-base dataset (no error occurred during this run):

cm run script --tags=run-mlperf,inference,_r4.1-dev,_short,_scc24-base \
   --model=sdxl \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda \
   --quiet

Then try to generate a submission tree by:

cm run script --tags=generate,inference,submission \
   --clean \
   --preprocess_submission=yes \
   --run-checker \
   --tar=yes \
   --env.CM_TAR_OUTFILE=submission.tar.gz \
   --division=open \
   --category=datacenter \
   --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
   --run_style=test \
   --adr.submission-checker.tags=_short-run \
   --quiet --rerun
   --submitter=XXX

Then I found this error and the :

[2024-11-03 18:47:55,206 submission_checker.py:1434 ERROR] open/thu/results/9f7dcf9a6c28-nvidia_original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/performance/run_1/mlperf_log_detail.txt Test duration less than 600s in user config. expected=600000, found=0
[2024-11-03 18:47:55,206 preprocess_submission.py:237 WARNING] offline scenario result is invalid for 9f7dcf9a6c28-nvidia_original-gpu-tensorrt-vdefault-scc24-base: stable-diffusion-xl in open division. Accuracy: False, Performance: False. Removing...

After looking into mlperf_log_detail.txt and the scripts, I found that effective_min_duration_ms is always 0, so it cannot meet the hard-coded expected value of 600000.

But in file mlperf_log_summary.txt I found this:

================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Offline
Mode     : PerformanceOnly
Samples per second: 1.39895
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: Yes

Do you know how I can solve this problem?

arjunsuresh commented 3 weeks ago

Hi @yangkai2002 That's an expected error as for SCC24 we are forcefully shortening the runtime. You are able to make a submission right?

yangkai2002 commented 3 weeks ago

Hi @yangkai2002 That's an expected error as for SCC24 we are forcefully shortening the runtime. You are able to make a submission right?

Yes, thanks.