mlcommons / inference_results_v2.1

This repository contains the results and code for the MLPerf™ Inference v2.1 benchmark.
https://mlcommons.org/en/inference-datacenter-21/
Apache License 2.0
18 stars 27 forks source link

Orin Retinanet result is far from the official one #7

Open stillbanbo opened 1 year ago

stillbanbo commented 1 year ago

Hi there, I have ran retinanet on jetson Orin and found the latency is greater than the official one. My result is 118112979.00 ns which is 0.16 of the official results 19378310.00 ns. Please have a check and correct me if there is something wrong.

Here is my stdout result:

================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 118112979
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: Yes
Early Stopping Result:
 * Processed at least 64 queries (517).
 * Would discard 34 highest latency queries.
 * Early stopping 90th percentile estimate: 118339606
 * Not enough queries processed for 99th percentile
 early stopping estimate (would need to process at
 least 662 total queries).

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 8.59
QPS w/o loadgen overhead        : 8.59

Min latency (ns)                : 114701216
Max latency (ns)                : 121048894
Mean latency (ns)               : 116430600
50.00 percentile latency (ns)   : 116026547
90.00 percentile latency (ns)   : 118112979
95.00 percentile latency (ns)   : 118500152
97.00 percentile latency (ns)   : 118850301
99.00 percentile latency (ns)   : 120218417
99.90 percentile latency (ns)   : 121048894

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 52.6316
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1
max_query_count : 0
qsl_rng_seed : 14284205019438841327
sample_index_rng_seed : 4163916728725999944
schedule_rng_seed : 299063814864929621
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 64

No warnings encountered during test.

No errors encountered during test.
Finished running actual test.
Device Device:0 processed:
  517 batches of size 1
  Memcpy Calls: 0
  PerSampleCudaMemcpy Calls: 0
  BatchedCudaMemcpy Calls: 0
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2022-12-06 13:44:24,190 run_harness.py:205 INFO] Result: result_90.00_percentile_latency_ns: 118112979, Result is VALID

======================= Extra Perf Stats: =======================

Orin_retinanet_SingleStream-lwis_k_99_MaxP:
    result_90.00_percentile_latency_ns: 118112979.00 is 0.16 of the current results 19378310.00.

======================= Perf harness results: =======================

Orin_TRT-lwis_k_99_MaxP-SingleStream:
    retinanet: result_90.00_percentile_latency_ns: 118112979, Result is VALID

======================= Accuracy results: =======================

Orin_TRT-lwis_k_99_MaxP-SingleStream:
    retinanet: No accuracy results in PerformanceOnly mode.

There is no direct way to make download in inference_results. So I got the dataset for this way:

git clone -b r2.1 https://github.com/mlcommons/inference.git
cd inference/vision/classification_and_detection
# pip install fiftyone -i [https://pypi.tuna.tsinghua.edu.cn/simple ](https://pypi.tuna.tsinghua.edu.cn/simple%C2%A0%C2%A0)  #15~30 minutes
# ./openimages_mlperf -d <DOWNLOAD_PATH>  #download the openimages mlperf validation set
./openimages_calibration_mlperf -d <DOWNLOAD_PATH> # download the openimages mlperf validation set. You can download the dataset by going
arjunsuresh commented 1 year ago

Are you running the quantized model?

# ./openimages_mlperf -d <DOWNLOAD_PATH>  #download the openimages mlperf validation set

This should be the command to download the validation dataset. The next line is for calibration set though it should not be affecting the performance numbers.

stillbanbo commented 1 year ago

Are you running the quantized model?

# ./openimages_mlperf -d <DOWNLOAD_PATH>  #download the openimages mlperf validation set

This should be the command to download the validation dataset. The next line is for calibration set though it should not be affecting the performance numbers.

Thanks for replying. Refer to the step of https://github.com/mlcommons/inference_results_v2.1/issues/6#issuecomment-1333001434, Singlestream has been running successfully. But Offline will fail and report as below: mt@mt-mt:~/mlperf/inference_results_v2.1/closed/NVIDIA$ make run_harness RUN_ARGS="--benchmarks=retinanet --scenarios=offline --fast --test_mode=PerformanceOnly" [2022-12-08 09:48:59,554 main_v2.py:221 INFO] Detected system ID: KnownSystem.Orin [2022-12-08 09:48:59,711 generate_conf_files.py:103 INFO] Generated measurements/ entries for Orin_TRT/retinanet/Offline [2022-12-08 09:48:59,711 __init__.py:44 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/build/logs/2022.12.08-09.48.57/Orin_TRT/retinanet/Offline" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_copy_streams=4 --gpu_inference_streams=1 --gpu_batch_size=4 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --gpu_engines="./build/engines/Orin/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan" --mlperf_conf_path="measurements/Orin_TRT/retinanet/Offline/mlperf.conf" --user_conf_path="measurements/Orin_TRT/retinanet/Offline/user.conf" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms [2022-12-08 09:48:59,711 __init__.py:51 INFO] Overriding Environment benchmark : Benchmark.Retinanet buffer_manager_thread_count : 0 data_dir : /media/mt/mt_application_n/mlperf_inference_data/data fast : True gpu_batch_size : 4 gpu_copy_streams : 4 gpu_inference_streams : 1 input_dtype : int8 input_format : linear log_dir : /home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/build/logs/2022.12.08-09.48.57 map_path : data_maps/open-images-v6-mlperf/val_map.txt offline_expected_qps : 65 precision : int8 preprocessed_data_dir : /media/mt/mt_application_n/mlperf_inference_data/preprocessed_data scenario : Scenario.Offline system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='ARMv8 Processor rev 1 (v8l)', architecture=<CPUArchitecture.aarch64: AliasedName(name='aarch64', aliases=(), patterns=())>, core_count=4, threads_per_core=1): 3}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=31.940928, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=31940928000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='Jetson AGX Orin', accelerator_type=<AcceleratorType.Integrated: AliasedName(name='Integrated', aliases=(), patterns=())>, vram=None, max_power_limit=None, pci_id=None, compute_sm=87): 1})), numa_conf=None, system_id='Orin') tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear test_mode : PerformanceOnly use_graphs : False system_id : Orin config_name : Orin_retinanet_Offline workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP) optimization_level : plugin-enabled use_cpu : False use_inferentia : False config_ver : lwis_k_99_MaxP accuracy_level : 99% inference_server : lwis soc_gpu_freq : None soc_dla_freq : None soc_cpu_freq : None soc_emc_freq : None orin_num_cores : None &&&& RUNNING Default_Harness # ./build/bin/harness_default [I] mlperf.conf path: measurements/Orin_TRT/retinanet/Offline/mlperf.conf [I] user.conf path: measurements/Orin_TRT/retinanet/Offline/user.conf Creating QSL. Finished Creating QSL. Setting up SUT. [I] [TRT] Loaded engine size: 73 MiB [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +540, now: CPU 901, GPU 14456 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +86, GPU +86, now: CPU 987, GPU 14542 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +2, GPU +69, now: CPU 2, GPU 69 (MiB) [I] Device:0: ./build/engines/Orin/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan has been successfully loaded. [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 915, GPU 14473 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 915, GPU 14473 (MiB) [E] [TRT] 2: [slot.h::decode::224] Error Code 2: Internal Error (Assertion slots failed. encoded reference to slot found, but slots missing) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/main_v2.py", line 223, in <module> main(main_args, DETECTED_SYSTEM) File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/main_v2.py", line 147, in main dispatch_action(main_args, config_dict, workload_setting) File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/main_v2.py", line 194, in dispatch_action handler.run() File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/actionhandler/base.py", line 79, in run self.handle_failure() File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/actionhandler/run_harness.py", line 221, in handle_failure raise RuntimeError("Run harness failed!") RuntimeError: Run harness failed! Traceback (most recent call last): File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/actionhandler/run_harness.py", line 204, in handle result = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True) File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/common/harness.py", line 278, in run_harness output = run_command(cmd, get_output=True, custom_env=self.env_vars) File "/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/code/common/__init__.py", line 65, in run_command raise subprocess.CalledProcessError(ret, cmd) subprocess.CalledProcessError: Command './build/bin/harness_default --plugins="build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/home/mt/mlperf/inference_results_v2.1/closed/NVIDIA/build/logs/2022.12.08-09.48.57/Orin_TRT/retinanet/Offline" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_copy_streams=4 --gpu_inference_streams=1 --gpu_batch_size=4 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --gpu_engines="./build/engines/Orin/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan" --mlperf_conf_path="measurements/Orin_TRT/retinanet/Offline/mlperf.conf" --user_conf_path="measurements/Orin_TRT/retinanet/Offline/user.conf" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms' died with <Signals.SIGSEGV: 11>. make: *** [Makefile:707: run_harness] Error 1

nvyihengz commented 1 year ago

Hi, a few things that affect the performance we need to check:

It would be really helpful if you could provide more details in more you setup your Jetson Orin. Our Jetson Readme file https://github.com/mlcommons/inference_results_v2.1/blob/master/closed/NVIDIA/README_Jetson.md also provides related info.