microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

Session load fails with Detectron2 #11229

Closed brantPTS closed 2 years ago

brantPTS commented 2 years ago

Detectron2 is Facebook AI's featured object detection model and it supports ONNX export, but session load fails with the Cuda execution provider.

See below for steps to reproduce. Thank you.

System info: Edition Windows 10 Pro Version 21H1 OS build 19043.1645 Processor Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz 3.00 GHz Installed RAM 32.0 GB System type 64-bit operating system, x64-based processor

GPU: Nvidia GTX 1080ti Nvidia Graphics 466.27

Steps to reproduce on Windows 10:

Install Detectron2 on Windows:

-- Ensure OpenCV is installed and set environment variable, such as OpenCV_DIR = d:\Local\opencv\build

Output log should look like:

Det2) D:\Local\Detectron\detectron2>python .\tools\deploy\export_model.py --config-file ./configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml --output ./outputOnnx --export-method tracing --format onnx MODEL.WEIGHTS detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl MODEL.DEVICE cuda [03/22 05:37:48 detectron2]: Command line arguments: Namespace(format='onnx', export_method='tracing', config_file='./configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', sample_image=None, run_eval=False, output='./outputOnnx', opts=['MODEL.WEIGHTS', 'detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl', 'MODEL.DEVICE', 'cuda']) [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\python\init.cpp:759] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 1) (function operator ()) [03/22 05:37:49 d2.data.datasets.coco]: Loaded 5000 images in COCO format from datasets\coco/annotations/instances_val2017.json [03/22 05:37:49 d2.data.build]: Distribution of instances among all 80 categories: category #instances category #instances category #instances
person 10777 bicycle 314 car 1918
motorcycle 367 airplane 143 bus 283
train 190 truck 414 boat 424
traffic light 634 fire hydrant 101 stop sign 75
parking meter 60 bench 411 bird 427
cat 202 dog 218 horse 272
sheep 354 cow 372 elephant 252
bear 71 zebra 266 giraffe 232
backpack 371 umbrella 407 handbag 540
tie 252 suitcase 299 frisbee 115
skis 241 snowboard 69 sports ball 260
kite 327 baseball bat 145 baseball gl.. 148
skateboard 179 surfboard 267 tennis racket 225
bottle 1013 wine glass 341 cup 895
fork 215 knife 325 spoon 253
bowl 623 banana 370 apple 236
sandwich 177 orange 285 broccoli 312
carrot 365 hot dog 125 pizza 284
donut 328 cake 310 chair 1771
couch 261 potted plant 342 bed 163
dining table 695 toilet 179 tv 288
laptop 231 mouse 106 remote 283
keyboard 153 cell phone 262 microwave 55
oven 143 toaster 9 sink 225
refrigerator 126 book 1129 clock 267
vase 274 scissors 36 teddy bear 190
hair drier 11 toothbrush 57
total 36335

[03/22 05:37:49 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [03/22 05:37:49 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ... [03/22 05:37:49 d2.data.common]: Serialized dataset takes 19.10 MiB d:\local\detectron\detectron2\detectron2\structures\image_list.py:79: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert t.shape[:-2] == tensors[0].shape[:-2], t.shape d:\local\Envs\Det2\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] d:\local\detectron\detectron2\detectron2\structures\boxes.py:148: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) d:\local\detectron\detectron2\detectron2\structures\boxes.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size() d:\local\detectron\detectron2\detectron2\modeling\proposal_generator\proposal_utils.py:97: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all(): d:\local\detectron\detectron2\detectron2\structures\boxes.py:189: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!" d:\local\detectron\detectron2\detectron2\structures\boxes.py:190: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results). h, w = box_size d:\local\detectron\detectron2\detectron2\layers\nms.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.shape[-1] == 4 d:\local\detectron\detectron2\detectron2\structures\instances.py:74: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. data_len = len(value) d:\local\detectron\detectron2\detectron2\modeling\poolers.py:211: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert len(box_lists) == x[0].size( d:\local\detectron\detectron2\detectron2\layers\roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert rois.dim() == 2 and rois.size(1) == 5 d:\local\detectron\detectron2\detectron2\modeling\roi_heads\fast_rcnn.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all(): d:\local\detectron\detectron2\detectron2\modeling\roi_heads\fast_rcnn.py:142: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). num_bbox_reg_classes = boxes.shape[1] // 4 d:\local\detectron\detectron2\detectron2\modeling\roi_heads\fast_rcnn.py:154: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if num_bbox_reg_classes == 1: WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

[repeated warning]

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. d:\local\Envs\Det2\lib\site-packages\torch\onnx\symbolic_opset9.py:2905: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn("Exporting aten::index operator of advanced indexing in opset " + WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

[repeated warning]

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. d:\local\Envs\Det2\lib\site-packages\torchvision\ops_register_onnx_ops.py:31: UserWarning: ROIAlign with aligned=True is not supported in ONNX, but will be supported in opset 16. The workaround is that the user need apply the patch https://github.com/microsoft/onnxruntime/pull/8564 and build ONNXRuntime from source. warnings.warn( WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

[repeated warning]

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. [03/22 05:38:04 detectron2]: Inputs schema: TupleSchema(schemas=[ListSchema(schemas=[DictSchema(schemas=[IdentitySchema()], sizes=[1], keys=['image'])], sizes=[1])], sizes=[1]) [03/22 05:38:04 detectron2]: Outputs schema: ListSchema(schemas=[DictSchema(schemas=[InstancesSchema(schemas=[TensorWrapSchema(class_name='detectron2.structures.Boxes'), IdentitySchema(), IdentitySchema()], sizes=[1, 1, 1], keys=['pred_boxes', 'pred_classes', 'scores'])], sizes=[4], keys=['instances'])], sizes=[4])

(Det2) D:\Local\Detectron\detectron2>

In a c# console application that references Ort version 1.10.0, try to create an onnx session: Session = new InferenceSession(modelPath, SessionOptions.MakeSessionOptionWithCudaProvider(gpuIndex));

wangyems commented 2 years ago

https://github.com/onnx/onnx/blob/main/docs/Operators.md#Clip Before opset 13 "clip" does not support tensor(int64). Can you try newer version of opset?

brantPTS commented 2 years ago

@wangyems, thank you for your prompt reply. Upgrading the Detectron2 python script to use opset 13 does solve the load problem.

My new problem is that the detections appear to be totally scrambled - although the number of objects is reasonable for the input image, the classes and locations are invalid. Is there an easy way to inspect the feature maps of an onnx session?

Thank you again for your prompt help.

Best, Brant

wangyems commented 2 years ago

Since onnx model is static graph, you need to some extra work to inspect intermediate values. There are generally two ways:

  1. Before exporting your models to onnx, (in original framework code) set some variables as output, return them as whole model's output. When the model is exported to onnx, you can simply check the outputs.

or

  1. Leverage the debug_node_input_output_utils. Build the ORT with --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1 and then setting some env variables during the run.

More to check: Are the onnx's raw inputs and outputs matched with the original framework? Does CPU ep generate same results as CUDA ep?

brantPTS commented 2 years ago

@wangyems, thank you, that is very helpful guidance. I will certainly scrutinize the input / output data. If that does note work, I'll use the ORT debug_node mode - that sounds extremely useful. Closing this issue.

brantPTS commented 2 years ago

Load issue was resolved

AidenFather commented 2 years ago

@brantPTS Could you please share details of this? Any suggestion/knowledge sharing would be great! Thank you in advance.

PureTechSystems commented 2 years ago

@brantPTS Could you please share details of this? Any suggestion/knowledge sharing would be great! Thank you in advance.

@AidenFather , I do not have any updates - the next step would be to scrutinize layer outputs to see where the PyTorch + python inference diverges from the onnx runtime inference. It's a shame that the D2 onnx model does not work on ORT