mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) with a human-friendly interface and minimal dependencies to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware (cloud/edge)
http://docs.mlcommons.org/cm4mlops/
Apache License 2.0
7 stars 12 forks source link

Error running ResNet50 on gateoverflow/nvidia container #60

Closed WarrenSchultz closed 1 month ago

WarrenSchultz commented 1 month ago

Something broke within the last week for the following configuration/steps.

cm pull repo gateoverflow@cm4mlops

cm docker script --tags=build,nvidia,inference,server --docker_cache=no --docker_cm_repo=gateoverflow@cm4mlops

and then running from within the container

cm run script --tags=run-mlperf,inference,_performance-only,_full  \
   --division=open \
   --category=edge \
   --device=cuda \
   --model=resnet50 \
   --precision=float32 \
   --implementation=nvidia \
   --backend=tensorrt \
   --scenario=Offline \
   --execution_mode=valid \
   --power=no \
   --adr.python.version_min=3.8 \
   --clean \
   --compliance=no \
   --quiet \
   --time \
   --offline_target_qps=13540

The container builds and runs successfully, but running the inference script returns the following error:

Config file missing for given hw_name: 'a9b4844e65df', implementation: 'nvidia_original', device: 'gpu,  backend: 'tensorrt', copying from default
Using MLCommons Inference source from '/home/cmuser/CM/repos/local/cache/e63b613a18404cc2/inference'
Traceback (most recent call last):
  File "/home/cmuser/.local/bin/cm", line 8, in <module>
    sys.exit(run())
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1486, in _run
    r = customize_code.preprocess(ii)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/run-mlperf-inference-app/customize.py", line 219, in preprocess
    r = cm.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1549, in _run
    r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta,  env, state, const, const_state, add_deps_recursive,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 2868, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 3039, in _run_deps
    r = self.cmind.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1376, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 2868, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 3039, in _run_deps
    r = self.cmind.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1486, in _run
    r = customize_code.preprocess(ii)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/customize.py", line 183, in preprocess
    required_min_queries_offline = get_required_min_queries_offline(env['CM_MODEL'], version)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/customize.py", line 440, in get_required_min_queries_offline
    if int(version[0]) < 4:
ValueError: invalid literal for int() with base 10: 'v'
arjunsuresh commented 1 month ago

Thanks for reporting this. Inside the container, can you please do cm pull repo and retry the command?

WarrenSchultz commented 1 month ago

That fixed it, thanks!