mlcommons / inference_results_v0.5

This repository contains the results and code for the MLPerf™ Inference v0.5 benchmark.
https://mlcommons.org/en/inference-datacenter-05/
Apache License 2.0
55 stars 43 forks source link

NVIDIA: Can I use Xavier MLPerf Inference code base for MLPerf Infernce benchmarking on Jetson TX2 and Nano. #15

Open Mamtesh11 opened 4 years ago

Mamtesh11 commented 4 years ago

I tried MLPerf Inference code base by NVIDIA for Xavier (Close Div) and It gives same result as Published. I want to use this code on TX2 and Nano. However I tried it with --gpu_only flag but didn't work out. So, can submitter from NVIDIA help to do MLPerf benchmarking on TX2 and Nano with published code base.

psyhtest commented 4 years ago

@Mamtesh11 That's an excellent question, I was wondering the same. The answer from NVIDIA was maybe, but some modifications and experimentation are needed. We (dividiti) are looking into this right now.

/cc @nvpohanh

Mamtesh11 commented 4 years ago

@psyhtest Yes, I saw them and tried to reproduce the same, run_harness process got killed in case of mobilenet multistream scenario.

psyhtest commented 4 years ago

We generated TensorRT plans for the Xavier configuration on a machine with GTX 1080 (compute capability 6.1). Unfortunately, we then failed to deploy it on both TX1 (compute capability 5.3) and TX2 (compute capability 6.2), e.g.:

[TensorRT] ERROR: INVALID_CONFIG: The engine plan file is generated on an incompatible device, expecting compute 6.2 got compute 6.1, please rebuild. 
[TensorRT] ERROR: engine.cpp (1324) - Serialization Error in deserialize: 0 (Core engine deserialization failure)
[TensorRT] ERROR: INVALID_STATE: std::exception 
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

@nvpohanh Is there a way to specify a different compute capability for the target?

nvpohanh commented 4 years ago

@psyhtest You have to generate the plans on the same GPU which you plan to run these plans on. Have you tried generating the plans on TX1 and/or TX2?

psyhtest commented 4 years ago

@nvpohanh Yes, but we have maxed out our 128 GB SD card on TX1, which had at least 70 GB of free space when we started :). I've ordered a 400 GB SD card now. And we don't even have external storage on our TX2 module. Is there a way to build only specific things like ResNet?

psyhtest commented 4 years ago

Further to previous question, is it still necessary to download COCO and object detection models if all I want is to generate TensorRT plans for ResNet?

nvpohanh commented 4 years ago

@psyhtest To build engines for specific benchmark-scenario combination, do make generate_engines RUN_ARGS="--benchmarks=<BENCHMARK> --scenarios=<SCENARIOS>". Also, I don't think generating engines requires "real" dataset. TRT use random data to do auto-tuning, anyway.

psyhtest commented 4 years ago

@nvpohanh How about calibration? Don't you need real data for that?

nvpohanh commented 4 years ago

@psyhtest Calibration can be shared across GPUs (in most cases). Could you try that?

Also, what's the point of trying INT8 on TX1 and/or TX2? Would FP32 suffice?

psyhtest commented 4 years ago

@nvpohanh We have suspected the same about sharing: calibration done on GTX 1080 seems to be quite similar to that done on TX1.

On TX1 and TX2, we get around 1.8x speedup with FP16 over FP32. We though we would get extra speedup with INT8. Unfortunately, neither hardware supports INT8, according to this matrix. Given that, FP16 is our best option, which we already support via CK.

nvpohanh commented 4 years ago

Glad that FP16 works and gives you the speedup!

psyhtest commented 4 years ago

Thanks! But I'm now wondering about @Mamtesh11's original question. Will the way NVIDIA constructed optimized TensorRT plans for Xavier work for the older devices with FP32/FP16 support only? (As I understand, Nano is equivalent to TX1, compute-capability wise.)

nvpohanh commented 4 years ago

@psyhtest Do you mean generating the TRT plans on Xavier and then run the plans on TX1? I am afraid that it won't work since TRT requires that you generate plans on the same GPU you run on. On the other hand, using TRT to generate plans on TX1 etc. should work.

psyhtest commented 4 years ago

@nvpohanh I get it that I need to generate and run plans on the same platform. But, IIRC, you construct one graph (SSD Large?) layer by layer. Do you specify the main data type there explicitly? Because if you do and that data type is INT8, then that won't work on any pre-Xavier hardware, right?

psyhtest commented 4 years ago

(That's what I meant by "the way NVIDIA constructed optimized TensorRT plans for Xavier".)

nvpohanh commented 4 years ago

We put all the configurable settings in the config files, like this one: https://github.com/mlperf/inference_results_v0.5/blob/master/closed/NVIDIA/measurements/Xavier/resnet/Offline/config.json

Under the hood, the script simply parses the config files and set TensorRT settings accordingly. Therefore, to run on older hardware, you just need to the config file has the correct settings, or you can look into the scripts and find the right TensorRT settings.

We don't currently have plan to provide official config files for MLPerf benchmarks for the older hardware, but feel free to let me know if you run into any issue.

psyhtest commented 4 years ago

Thanks @nvpohanh, will do!

psyhtest commented 4 years ago

Please see @ens-lg4's comment here.

psyhtest commented 4 years ago

@nvpohanh Given the issues with running object detection models on TX1/TX2 that we reported, I'm wondering the Xavier AGX binaries are going to work on the upcoming Xavier NX? In particular, are you aware of any required changes e.g. to the input layout induced by the DLAs?