onnxruntime-gpu get warning "Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled".

liuxianyi commented 1 year ago

Describe the issue

I encountered this warning and I feel that this warning caused my onnx reasoning to work poorly on the GPU.

 [W:onnxruntime:, inference_session.cc:1488 Initialize] Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in.

To reproduce

    providers = [('CUDAExecutionProvider', {
                    'device_id': 0,
                    'arena_extend_strategy': 'kNextPowerOfTwo',
                    'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
                    'cudnn_conv_algo_search': 'EXHAUSTIVE',
                    'do_copy_in_default_stream': True,
                }), 
                'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']

    if cuda:  
        sess_options = ort.SessionOptions()
        #Optional: store the optimized graph and view it using Netron to verify that model is fully optimized.
        # Note that this will increase session creation time so enable it for debugging only.
        sess_options.optimized_model_filepath = onnx_file
        # Please change the value according to best setting in Performance Test Tool result.
        sess_options.intra_op_num_threads=psutil.cpu_count(logical=True)
        ort_sess = ort.InferenceSession(onnx_file, sess_options, providers=providers)

    else:
        ort_sess = ort.InferenceSession(onnx_file, providers=providers)

    print(ort_sess.get_providers())
    # 如果输出中含有CUDAExecutionProvider,则证明可以正常调用GPU
    # ['CUDAExecutionProvider', 'CPUExecutionProvider']

    meta = ort_sess.get_modelmeta().custom_metadata_map  # metadata

    return ort_sess, meta

I use CPU for inference. recorder the inference time:

['CPUExecutionProvider']
/media/Harddisk/goog/onnx/python/toyota/object_detection/images Done. (0.984s)

then, I use GPU for inference. Recorder the inference time. And get the following warning:

2022-11-19 15:06:18.326509440 [W:onnxruntime:, inference_session.cc:1488 Initialize] Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in.
['CUDAExecutionProvider', 'CPUExecutionProvider']
/media/Harddisk/goog/onnx/python/toyota/object_detection/images Done. (0.974s)

compare the inference time. i found the performance for onnxruntime-gpu is bad. I think the warning "Graph Optimization level greater than ORT_ENABLE_EXTENDED" cause the bad performance. so, how can l solve this?

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-gpu ==1.12.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.1 Cudnn 8.0.5

tangjicheng1 commented 1 year ago

Maybe your gpu hardware is worse than your cpu hardward. Your can have a try with TensorRT.

yuslepukhin commented 1 year ago

Optimized models cannot switch hardware, because they are optimized for the specific environment. Read this before using optimized models

microsoft / onnxruntime