mlcommons / inference_results_v3.0

This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
https://mlcommons.org/en/inference-datacenter-30/
Apache License 2.0
18 stars 15 forks source link

Unable to reproduce NVIDIA Orin MaxQ power figures #10

Open psyhtest opened 1 year ago

psyhtest commented 1 year ago

We have very diligently followed NVIDIA's instructions for benchmarking Orin AGX, including flashing our unit with exactly the same images for the MaxP and MaxQ modes and getting exactly the same power supply model.

We have come close to reproducing performance figures for both the MaxP and MaxQ modes. Unfortunately, our power measurements show 6-7 Watt higher power consumption than reported in NVIDIA's submission.

psyhtest commented 1 year ago

MaxQ

Workload Results Offline Performance, QPS Offline Power, W SingleStream Performance, ms SingleStream Energy, J/stream MultiStream Performance, ms MultiStream Energy, J/stream
ResNet50 Submitted 3463.11 22.66 1.64 22.19 4.76 80.76
Reproduced 3478.21 29.93 1.645364 27.61 4.868431 95.90
RetinaNet Submitted 50.09 20.69 26.56 521.34 149.98 4025.73
Reproduced 50.1062 26.34 26.884007 668.51 160.659905 4986.34
BERT-99 Submitted 268.52 22.49 7.89 160.19 X X
Reproduced 265.98 29.54 7.813791 216.08 X X
arjunsuresh commented 1 year ago

We also tried reproducing Nvidia Orin numbers and we could match the performance numbers pretty closely - which is great considering we are running on a different Orin. Our experiment results are summarized below.

https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/optimize-mlperf-inference-v3.0-2023/docs/setup-nvidia-jetson-orin.md

Unfortunately, we are also way off in reproducing the power numbers. We see very similar values to what @psyhtest has reported.

arjunsuresh commented 1 year ago

@psyhtest Can you please confirm if the Orin power runs are taken by connecting from a host machine? Also not sure if the below disk is present in Nvidia Orin submission system.

/dev/mmcblk1p1  469G  327G  118G  74% /sd
psyhtest commented 1 year ago

Can you please confirm if the Orin power runs are taken by connecting from a host machine?

Just about the only thing we didn't do according to the provided instructions was to connect via USB serial. Instead, we connected via Ethernet, as to all other platforms we benchmarked in this round. In our experience, using Ethernet does not contribute significantly to power consumption.

Also not sure if the below disk is present in Nvidia Orin submission system.

This is a 500G microSD card. Again, we don't believe it contributes significantly to power consumption. Most of our systems have such cards installed, because internal flash memory is rarely sufficient to hold code, datasets, models, Docker images, etc. If NVIDIA did manage to fit everything into 64G, I'd be interested to know how.

arjunsuresh commented 1 year ago

Thank you @psyhtest for sharing this. These tips are useful to anyone doing power measurements :)

28W is what we got from our setup for R50 offline on Nvidia Orin with 3475.48 QPS.

  1. We are using the USB-C cable from the host as done in the Nvidia documentation - possibly saving 1-2W compared to Ethernet
  2. No additional micro SD card being used - possibly saving 1W?
  3. We are using the default Orin power cable - possibly adding 1-2W?

I believe 64G is enough for any single benchmark. For R50 power measurement, we did not use any extra SD card but if we put enough effort into cleaning data once used, I believe this ~1W savings can be obtained for all the benchmarks. And Nvidia Orin system description mentions only 64G storage.

The remaining ~3W difference I attribute to the crest factor issue where the ranging mode run is not measuring the peak current usage as expected and so the testing mode run is happening at a lower current range and the current PTDaemon is not capturing this error.

@psyhtest Since you are doing a lot of low-power submissions possibly it is worth trying to lower the power factor by artificially modifying the electrical circuit. If time permits, this is something I'll give a try :)