Error running ResNet50 on gateoverflow/nvidia container

WarrenSchultz commented 1 month ago

Something broke within the last week for the following configuration/steps.

cm pull repo gateoverflow@cm4mlops

cm docker script --tags=build,nvidia,inference,server --docker_cache=no --docker_cm_repo=gateoverflow@cm4mlops

and then running from within the container

cm run script --tags=run-mlperf,inference,_performance-only,_full  \
   --division=open \
   --category=edge \
   --device=cuda \
   --model=resnet50 \
   --precision=float32 \
   --implementation=nvidia \
   --backend=tensorrt \
   --scenario=Offline \
   --execution_mode=valid \
   --power=no \
   --adr.python.version_min=3.8 \
   --clean \
   --compliance=no \
   --quiet \
   --time \
   --offline_target_qps=13540

The container builds and runs successfully, but running the inference script returns the following error:

Config file missing for given hw_name: 'a9b4844e65df', implementation: 'nvidia_original', device: 'gpu,  backend: 'tensorrt', copying from default
Using MLCommons Inference source from '/home/cmuser/CM/repos/local/cache/e63b613a18404cc2/inference'
Traceback (most recent call last):
  File "/home/cmuser/.local/bin/cm", line 8, in <module>
    sys.exit(run())
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1486, in _run
    r = customize_code.preprocess(ii)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/run-mlperf-inference-app/customize.py", line 219, in preprocess
    r = cm.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1549, in _run
    r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta,  env, state, const, const_state, add_deps_recursive,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 2868, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 3039, in _run_deps
    r = self.cmind.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1376, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 2868, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 3039, in _run_deps
    r = self.cmind.access(ii)
  File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 211, in run
    r = self._run(i)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/automation/script/module.py", line 1486, in _run
    r = customize_code.preprocess(ii)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/customize.py", line 183, in preprocess
    required_min_queries_offline = get_required_min_queries_offline(env['CM_MODEL'], version)
  File "/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/customize.py", line 440, in get_required_min_queries_offline
    if int(version[0]) < 4:
ValueError: invalid literal for int() with base 10: 'v'

arjunsuresh commented 1 month ago

Thanks for reporting this. Inside the container, can you please do cm pull repo and retry the command?

WarrenSchultz commented 1 month ago

That fixed it, thanks!

mlcommons / cm4mlops

Error running ResNet50 on gateoverflow/nvidia container #60