microsoft / superbenchmark

A validation and profiling tool for AI infrastructure
https://aka.ms/superbench
MIT License
270 stars 59 forks source link

[Bug Report] ONNX export failed on adaptive_avg_pool2d at tensorrt micro bench. #352

Open LeiWang1999 opened 2 years ago

LeiWang1999 commented 2 years ago

I am currently working on the superbench/superbench:v0.4.0-cuda11.1.1 docker workspace to measure benchmark.

To get different model's benchmark with tensorrt, I customized the superbenchmark/examples/benchmarks/tensorrt_inference_performance.py like below

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Micro benchmark example for TensorRT inference performance.

Commands to run:
    python3 examples/benchmarks/tensorrt_inference_performance.py
"""
import sys
from statistics import mode
from superbench.benchmarks import BenchmarkRegistry, Platform
from superbench.common.utils import logger

if __name__ == '__main__':
    batch = int(sys.argv[1])
    model = sys.argv[2]
    precision = sys.argv[3]
    parameters = '--batch_size {0} --pytorch_models {1} --precision {2} --seq_length 8 --iterations 105'.format(batch, model, precision)

    context = BenchmarkRegistry.create_benchmark_context('tensorrt-inference', platform=Platform.CUDA, parameters=parameters)
    benchmark = BenchmarkRegistry.launch_benchmark(context)
    if benchmark:
        logger.info(
            'benchmark: {}, return code: {}, result: {}'.format(
                benchmark.name, benchmark.return_code, benchmark.result
            )
        )

execution:

nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt

log :

root@616b67a69ab7:/opt/superbench# nvprof --log-file benches/TensorRT/vgg11/fp32_batch_1_prof.txt /opt/conda/bin/python /opt/superbench/examples/benchmarks/tensorrt_inference_performance.py 1 vgg11 fp32 | tee benches/TensorRT/vgg11/fp32_batch_1_time.txt
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:256: UserWarning: `add_node_names' can be set to True only when 'operator_export_type' is `ONNX`. Since 'operator_export_type' is not set to 'ONNX', `add_node_names` argument will be ignored.
warnings.warn("`{}' can be set to True only when 'operator_export_type' is "
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:256: UserWarning: `do_constant_folding' can be set to True only when 'operator_export_type' is `ONNX`. Since 'operator_export_type' is not set to 'ONNX', `do_constant_folding` argument will be ignored.
warnings.warn("`{}' can be set to True only when 'operator_export_type' is "
/opt/conda/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py:182: UserWarning: ONNX export failed on adaptive_avg_pool2d because input size not accessible not supported
warnings.warn("ONNX export failed on " + op + " because " + msg + " not supported")
[2022-05-06 12:33:25,995 616b67a69ab7:18330][micro_base.py:167][INFO] Execute command - round: 0, benchmark: tensorrt-inference, command: /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99.
[2022-05-06 12:33:40,844 616b67a69ab7:18330][micro_base.py:176][ERROR] Microbenchmark execution failed - round: 0, benchmark: tensorrt-inference, error message: &&&& RUNNING TensorRT.trtexec # /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99
[05/06/2022-12:33:26] [I] === Model Options ===
[05/06/2022-12:33:26] [I] Format: ONNX
[05/06/2022-12:33:26] [I] Model: /root/.cache/torch/hub/onnx/vgg11.onnx
[05/06/2022-12:33:26] [I] Output:
[05/06/2022-12:33:26] [I] === Build Options ===
[05/06/2022-12:33:26] [I] Max batch: explicit
[05/06/2022-12:33:26] [I] Workspace: 8192 MiB
[05/06/2022-12:33:26] [I] minTiming: 1
[05/06/2022-12:33:26] [I] avgTiming: 8
[05/06/2022-12:33:26] [I] Precision: FP32
[05/06/2022-12:33:26] [I] Calibration:
[05/06/2022-12:33:26] [I] Refit: Disabled
[05/06/2022-12:33:26] [I] Safe mode: Disabled
[05/06/2022-12:33:26] [I] Save engine:
[05/06/2022-12:33:26] [I] Load engine:
[05/06/2022-12:33:26] [I] Builder Cache: Enabled
[05/06/2022-12:33:26] [I] NVTX verbosity: 0
[05/06/2022-12:33:26] [I] Tactic sources: Using default tactic sources
[05/06/2022-12:33:26] [I] Input(s)s format: fp32:CHW
[05/06/2022-12:33:26] [I] Output(s)s format: fp32:CHW
[05/06/2022-12:33:26] [I] Input build shape: input=1x3x224x224+1x3x224x224+1x3x224x224
[05/06/2022-12:33:26] [I] Input calibration shapes: model
[05/06/2022-12:33:26] [I] === System Options ===
[05/06/2022-12:33:26] [I] Device: 0
[05/06/2022-12:33:26] [I] DLACore:
[05/06/2022-12:33:26] [I] Plugins:
[05/06/2022-12:33:26] [I] === Inference Options ===
[05/06/2022-12:33:26] [I] Batch: Explicit
[05/06/2022-12:33:26] [I] Input inference shape: input=1x3x224x224
[05/06/2022-12:33:26] [I] Iterations: 105
[05/06/2022-12:33:26] [I] Duration: 3s (+ 200ms warm up)
[05/06/2022-12:33:26] [I] Sleep time: 0ms
[05/06/2022-12:33:26] [I] Streams: 1
[05/06/2022-12:33:26] [I] ExposeDMA: Disabled
[05/06/2022-12:33:26] [I] Data transfers: Enabled
[05/06/2022-12:33:26] [I] Spin-wait: Disabled
[05/06/2022-12:33:26] [I] Multithreading: Disabled
[05/06/2022-12:33:26] [I] CUDA Graph: Disabled
[05/06/2022-12:33:26] [I] Separate profiling: Disabled
[05/06/2022-12:33:26] [I] Skip inference: Disabled
[05/06/2022-12:33:26] [I] Inputs:
[05/06/2022-12:33:26] [I] === Reporting Options ===
[05/06/2022-12:33:26] [I] Verbose: Disabled
[05/06/2022-12:33:26] [I] Averages: 10 inferences
[05/06/2022-12:33:26] [I] Percentile: 99
[05/06/2022-12:33:26] [I] Dump refittable layers:Disabled
[05/06/2022-12:33:26] [I] Dump output: Disabled
[05/06/2022-12:33:26] [I] Profile: Disabled
[05/06/2022-12:33:26] [I] Export timing to JSON file:
[05/06/2022-12:33:26] [I] Export output to JSON file:
[05/06/2022-12:33:26] [I] Export profile to JSON file:
[05/06/2022-12:33:26] [I]
[05/06/2022-12:33:26] [I] === Device Information ===
[05/06/2022-12:33:26] [I] Selected Device: NVIDIA Tesla V100-PCIE-16GB
[05/06/2022-12:33:26] [I] Compute Capability: 7.0
[05/06/2022-12:33:26] [I] SMs: 80
[05/06/2022-12:33:26] [I] Compute Clock Rate: 1.38 GHz
[05/06/2022-12:33:26] [I] Device Global Memory: 16160 MiB
[05/06/2022-12:33:26] [I] Shared Memory per SM: 96 KiB
[05/06/2022-12:33:26] [I] Memory Bus Width: 4096 bits (ECC enabled)
[05/06/2022-12:33:26] [I] Memory Clock Rate: 0.877 GHz
[05/06/2022-12:33:26] [I]
----------------------------------------------------------------
Input filename: /root/.cache/torch/hub/onnx/vgg11.onnx
ONNX IR version: 0.0.6
Opset version: 10
Producer name: pytorch
Producer version: 1.8
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[05/06/2022-12:33:40] [W] [TRT] /workspace/TensorRT/parsers/onnx/onnx2trt_utils.cpp:218: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/06/2022-12:33:40] [I] [TRT] /workspace/TensorRT/parsers/onnx/ModelImporter.cpp:139: No importer registered for op: adaptive_avg_pool2d. Attempting to import as plugin.
[05/06/2022-12:33:40] [I] [TRT] /workspace/TensorRT/parsers/onnx/builtin_op_importers.cpp:3716: Searching for plugin: adaptive_avg_pool2d, plugin_version: 1, plugin_namespace:
[05/06/2022-12:33:40] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin adaptive_avg_pool2d version 1
While parsing node number 22 [adaptive_avg_pool2d]:
ERROR: /workspace/TensorRT/parsers/onnx/builtin_op_importers.cpp:3718 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[05/06/2022-12:33:40] [E] Failed to parse onnx file
[05/06/2022-12:33:40] [E] Parsing model failed
[05/06/2022-12:33:40] [E] Engine creation failed
[05/06/2022-12:33:40] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /opt/tensorrt/bin/trtexec --onnx=/root/.cache/torch/hub/onnx/vgg11.onnx --explicitBatch --optShapes=input:1x3x224x224 --workspace=8192 --iterations=105 --percentile=99
.
[2022-05-06 12:33:40,844 616b67a69ab7:18330][tensorrt_inference_performance.py:23][INFO] benchmark: tensorrt-inference, return code: 32, result: {'return_code': [32]}

It seems that the trt onnx importer can not support the adaptive_avg_pool2d op?

Please cc.

LeiWang1999 commented 2 years ago

I compared the the vgg11 generated by superbench(left) and a vgg net manually converted from pth(right).

I guess the adaptive_avg_pool2d should be converted into global_avg_pool so it can be imported by tensorrt onnx importer.

image