Session load fails with Detectron2

brantPTS commented 2 years ago

Detectron2 is Facebook AI's featured object detection model and it supports ONNX export, but session load fails with the Cuda execution provider.

See below for steps to reproduce. Thank you.

System info: Edition Windows 10 Pro Version 21H1 OS build 19043.1645 Processor Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz 3.00 GHz Installed RAM 32.0 GB System type 64-bit operating system, x64-based processor

GPU: Nvidia GTX 1080ti Nvidia Graphics 466.27

Steps to reproduce on Windows 10:

Install Detectron2 on Windows:

Create python environment and activate: [python install path]Python39\python.exe -m venv d:\local\Envs\Det2 --copies d:\local\Envs\Det2\scripts\activate.bat
Install PyTorch: pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio===0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html git clone https://github.com/facebookresearch/detectron2.git
Ensure CMake is installed (I have 3.22.0 rc2)
Ensure Nvidia GPU Computing toolkit is installed (I have 11.4)
Ensure matching CuDNN .dll, .bin, *.h files are copied into GPU Computing toolkit bin, lib, include folders -- For example, I copied --- D:\Local\CuDNN\cudnn-11.4-windows-x64-v8.2.2.26\cuda\bin\cudnn_ops_train64_8.dll (and all other dlls) to: --- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\cudnn_ops_train64_8.dll -- and copied: -- D:\Local\CuDNN\cudnn-11.4-windows-x64-v8.2.2.26\cuda\lib\x64\cudnn_ops_train.lib to: --- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\lib\x64\cudnn_ops_train.lib -- and copied: --- D:\Local\CuDNN\cudnn-11.4-windows-x64-v8.2.2.26\cuda\include\cudnn_ops_train.h to: --- D:\Local\CuDNN\cudnn-11.4-windows-x64-v8.2.2.26\cuda\include\cudnn_ops_train.h

-- Ensure OpenCV is installed and set environment variable, such as OpenCV_DIR = d:\Local\opencv\build

In a Visual Studio 2019 command prompt: set DISTUTILS_USE_SDK=1 d: cd d:\Local\Detectron d:\local\Envs\Det2\scripts\activate.bat python -m pip install -e detectron2
Ensure you have coco data (LARGE amount of data, used by export_model.py for some reason) D:\Local\Detectron\detectron2\datasets\coco\annotations D:\Local\Detectron\detectron2\datasets\coco\val2017
In a normal command prompt, generate onnx model: d: cd d:\Local\Detectron d:\local\Envs\Det2\scripts\activate.bat D:\Local\Detectron\detectron2>python .\tools\deploy\export_model.py --config-file ./configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml --output ./outputOnnx --export-method tracing --format onnx MODEL.WEIGHTS detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl MODEL.DEVICE cuda

Output log should look like:

Det2) D:\Local\Detectron\detectron2>python .\tools\deploy\export_model.py --config-file ./configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml --output ./outputOnnx --export-method tracing --format onnx MODEL.WEIGHTS detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl MODEL.DEVICE cuda [03/22 05:37:48 detectron2]: Command line arguments: Namespace(format='onnx', export_method='tracing', config_file='./configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', sample_image=None, run_eval=False, output='./outputOnnx', opts=['MODEL.WEIGHTS', 'detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl', 'MODEL.DEVICE', 'cuda']) [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\python\init.cpp:759] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 1) (function operator ()) [03/22 05:37:49 d2.data.datasets.coco]: Loaded 5000 images in COCO format from datasets\coco/annotations/instances_val2017.json [03/22 05:37:49 d2.data.build]: Distribution of instances among all 80 categories:	category	#instances	category	#instances	category
person	10777	bicycle	314	car	1918
motorcycle	367	airplane	143	bus	283
train	190	truck	414	boat	424
traffic light	634	fire hydrant	101	stop sign	75
parking meter	60	bench	411	bird	427
cat	202	dog	218	horse	272
sheep	354	cow	372	elephant	252
bear	71	zebra	266	giraffe	232
backpack	371	umbrella	407	handbag	540
tie	252	suitcase	299	frisbee	115
skis	241	snowboard	69	sports ball	260
kite	327	baseball bat	145	baseball gl..	148
skateboard	179	surfboard	267	tennis racket	225
bottle	1013	wine glass	341	cup	895
fork	215	knife	325	spoon	253
bowl	623	banana	370	apple	236
sandwich	177	orange	285	broccoli	312
carrot	365	hot dog	125	pizza	284
donut	328	cake	310	chair	1771
couch	261	potted plant	342	bed	163
dining table	695	toilet	179	tv	288
laptop	231	mouse	106	remote	283
keyboard	153	cell phone	262	microwave	55
oven	143	toaster	9	sink	225
refrigerator	126	book	1129	clock	267
vase	274	scissors	36	teddy bear	190
hair drier	11	toothbrush	57
total	36335

[03/22 05:37:49 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [03/22 05:37:49 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ... [03/22 05:37:49 d2.data.common]: Serialized dataset takes 19.10 MiB d:\local\detectron\detectron2\detectron2\structures\image_list.py:79: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert t.shape[:-2] == tensors[0].shape[:-2], t.shape d:\local\Envs\Det2\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] d:\local\detectron\detectron2\detectron2\structures\boxes.py:148: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device) d:\local\detectron\detectron2\detectron2\structures\boxes.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size() d:\local\detectron\detectron2\detectron2\modeling\proposal_generator\proposal_utils.py:97: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all(): d:\local\detectron\detectron2\detectron2\structures\boxes.py:189: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!" d:\local\detectron\detectron2\detectron2\structures\boxes.py:190: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results). h, w = box_size d:\local\detectron\detectron2\detectron2\layers\nms.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.shape[-1] == 4 d:\local\detectron\detectron2\detectron2\structures\instances.py:74: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. data_len = len(value) d:\local\detectron\detectron2\detectron2\modeling\poolers.py:211: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert len(box_lists) == x[0].size( d:\local\detectron\detectron2\detectron2\layers\roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert rois.dim() == 2 and rois.size(1) == 5 d:\local\detectron\detectron2\detectron2\modeling\roi_heads\fast_rcnn.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all(): d:\local\detectron\detectron2\detectron2\modeling\roi_heads\fast_rcnn.py:142: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). num_bbox_reg_classes = boxes.shape[1] // 4 d:\local\detectron\detectron2\detectron2\modeling\roi_heads\fast_rcnn.py:154: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if num_bbox_reg_classes == 1: WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

[repeated warning]

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. d:\local\Envs\Det2\lib\site-packages\torch\onnx\symbolic_opset9.py:2905: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn("Exporting aten::index operator of advanced indexing in opset " + WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

[repeated warning]

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. d:\local\Envs\Det2\lib\site-packages\torchvision\ops_register_onnx_ops.py:31: UserWarning: ROIAlign with aligned=True is not supported in ONNX, but will be supported in opset 16. The workaround is that the user need apply the patch https://github.com/microsoft/onnxruntime/pull/8564 and build ONNXRuntime from source. warnings.warn( WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

[repeated warning]

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. [03/22 05:38:04 detectron2]: Inputs schema: TupleSchema(schemas=[ListSchema(schemas=[DictSchema(schemas=[IdentitySchema()], sizes=[1], keys=['image'])], sizes=[1])], sizes=[1]) [03/22 05:38:04 detectron2]: Outputs schema: ListSchema(schemas=[DictSchema(schemas=[InstancesSchema(schemas=[TensorWrapSchema(class_name='detectron2.structures.Boxes'), IdentitySchema(), IdentitySchema()], sizes=[1, 1, 1], keys=['pred_boxes', 'pred_classes', 'scores'])], sizes=[4], keys=['instances'])], sizes=[4])

(Det2) D:\Local\Detectron\detectron2>

In a c# console application that references Ort version 1.10.0, try to create an onnx session: Session = new InferenceSession(modelPath, SessionOptions.MakeSessionOptionWithCudaProvider(gpuIndex));

$exception {"[ErrorCode:InvalidGraph] Load model from C:\PureActiv\Models\Vis20220318_frcnn_3_1202x800.onnx failed:This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (onnx::Clip_1828) of operator (Clip) in node (Clip_1289) is invalid."} Microsoft.ML.OnnxRuntime.OnnxRuntimeException

wangyems commented 2 years ago

https://github.com/onnx/onnx/blob/main/docs/Operators.md#Clip Before opset 13 "clip" does not support tensor(int64). Can you try newer version of opset?

brantPTS commented 2 years ago

@wangyems, thank you for your prompt reply. Upgrading the Detectron2 python script to use opset 13 does solve the load problem.

My new problem is that the detections appear to be totally scrambled - although the number of objects is reasonable for the input image, the classes and locations are invalid. Is there an easy way to inspect the feature maps of an onnx session?

Thank you again for your prompt help.

Best, Brant

wangyems commented 2 years ago

Since onnx model is static graph, you need to some extra work to inspect intermediate values. There are generally two ways:

Before exporting your models to onnx, (in original framework code) set some variables as output, return them as whole model's output. When the model is exported to onnx, you can simply check the outputs.

or

Leverage the debug_node_input_output_utils. Build the ORT with --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1 and then setting some env variables during the run.

More to check: Are the onnx's raw inputs and outputs matched with the original framework? Does CPU ep generate same results as CUDA ep?

brantPTS commented 2 years ago

@wangyems, thank you, that is very helpful guidance. I will certainly scrutinize the input / output data. If that does note work, I'll use the ORT debug_node mode - that sounds extremely useful. Closing this issue.

brantPTS commented 2 years ago

Load issue was resolved

AidenFather commented 2 years ago

@brantPTS Could you please share details of this? Any suggestion/knowledge sharing would be great! Thank you in advance.

PureTechSystems commented 2 years ago

@brantPTS Could you please share details of this? Any suggestion/knowledge sharing would be great! Thank you in advance.

@AidenFather , I do not have any updates - the next step would be to scrutinize layer outputs to see where the PyTorch + python inference diverges from the onnx runtime inference. It's a shame that the D2 onnx model does not work on ORT

microsoft / onnxruntime

Session load fails with Detectron2 #11229