mlcommons / inference_results_v3.0

This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
https://mlcommons.org/en/inference-datacenter-30/
Apache License 2.0
18 stars 15 forks source link

Running the benchmark on Xavier AGX #16

Open JoachimMoe opened 12 months ago

JoachimMoe commented 12 months ago

I am currently trying to execute the different benchmarks on the NVIDIA Jetson Xavier AGX.

I have successfully built the container with make prebuild and make build. This has launched the container and everything is fine. I then run python3 -m scripts.custom_systems.add_custom_system to add the system, which I give "Xavier" as the custom name. This makes for the error that only Orin is supported currently. This error is by-passable by vimming into the Makefile.docker and adding the following lines to the top:

IS_SOC=1
SOC_SM=87

As per the documentation, running make build will build the harnesses, why I run the following command to execute a benchmark:

make run_harness RUN_ARGS="--benchmark=resnet50, scenarios=Offline"

This results in the following error:

Detected system ID: KnownSystem.Xavier
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/work/code/main.py", line 233, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/work/code/main.py", line 104, in main
    load_config_fn(benchmarks, scenarios)
  File "/work/code/main.py", line 54, in populate_config_registry
    ConfigRegistry.load_configs(benchmark, scenario)
  File "/work/configs/configuration.py", line 123, in load_configs
    importlib.import_module(f"{base_module}.custom")
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/work/configs/bert/Offline/custom.py", line 8, in <module>
    class ORIN(OfflineGPUBaseConfig):
  File "/work/configs/configuration.py", line 207, in _do_register
    raise KeyError("Config for {} is already registered.".format("/".join(map(str, keyspace))))
KeyError: 'Config for Benchmark.BERT/Scenario.Offline/KnownSystem.Orin/WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) is already registered.'
make: *** [Makefile:45: run_harness] Error 1

This is somewhat odd, as I am specifically attempting to run the ResNet50 benchmark, however the error seems to stem from BERT. Is there any way for me to, albeit in a hacky fashion, to actually execute the ResNet50 benchmark on the Xavier?

JoachimMoe commented 12 months ago

I made it run on the Xavier. If anyone else is attempting the same and are not hopeful (as me) for any answers, please feel free to reach out!