mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.24k stars 536 forks source link

[v4.1 inference] Detected system did not match any known systems. #1937

Open loganwuw opened 6 hours ago

loganwuw commented 6 hours ago

Hi, I'm facing some issues when i tried running the benchmark for 3d-unet. When i ran make run RUN_ARGS="--benchmarks=3d-unet --scenarios=offline,server""

Got the errors, which is also the system i'm working on Detected system did not match any known systems. Exiting. SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9554 64-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=56, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.5849542239999999, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584954224000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 8})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=16), system_id=None)

Driver Version: 550.90.07 CUDA Version: 12.4

I didn't manually add any system configuration under /work/code/common/system since i didn't saw 4.1 inference result submitter who submit on H100 Pcie 80GB customized that.

any suggestion could help me to pass the error? thanks a lot!

arjunsuresh commented 6 hours ago

Hi @loganwuw Nvidia implementation detects the system configuration which includes the GPU and CPUs and if the system is not a known one separate scripts need to be called to initialize the system. If you want to benchmark 3d-unet, you can use the below CM wrapping - where those manual steps are automated. We actually do nightly runs for these benchmarks and store the results here

https://docs.mlcommons.org/inference/benchmarks/medical_imaging/3d-unet/#__tabbed_1_2