microsoft / DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
MIT License
2.17k stars 288 forks source link

Outputs of masks are diffrent between onnxruntime and onnxruntime-directml, from onnxruntime-directml==1.15, when using detectron2 Mask r-cnn Model. #614

Open YOODS-Xu opened 1 month ago

YOODS-Xu commented 1 month ago

I found from onnxruntime-directml==1.15, mask outputs are diffrent and incorrect between directml and onnxruntime. I used detectron2 Mask r-cnn. In python onnxruntime-directml==1.14.1 is ok, and in c++ only onnxruntime-directml==1.12.0 is ok for ops version==17.

How can I solve this problem? Did any other meet this issue? Any suggestions are much appreciated.

YOODS-Xu commented 1 month ago

Bags model is trained using bags images, and cartons model is trained using cartons images. The images such bags are diffrent and incorrect. 0013-19-56_67

The images such cartons are same and correct. cv_img230220-101206-195299_90

YOODS-Xu commented 1 month ago
  1. The boxes, labels and scores are same and correct, only masks are diffrent and incorrect.

  2. My source code is as below session = onnxruntime.InferenceSession( "model/detectron2/bag/model.onnx",

    providers=['CPUExecutionProvider'])

    providers=['DmlExecutionProvider'])

    ...

  3. When translated pytorch model to onnx model, there are some TraceWarning as such below. However masks outputs are diffrent even using the same image used for converting model from pytorch to onnx. /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/proposal_utils.py:106: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all():

YOODS-Xu commented 1 month ago

below is full TraceWarning:

/home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/image_list.py:85: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert t.shape[:-2] == tensors[0].shape[:-2], t.shape /home2/ros/venv/detectron2/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if tensor.numel() == 0: /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size() /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if tensor.numel() == 0: /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size() /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/proposal_utils.py:106: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all(): /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!" /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results). h, w = box_size /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/layers/nms.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.shape[-1] == 4 /home2/ros/venv/detectron2/lib/python3.8/site-packages/torch/init.py:1404: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert condition, message /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/layers/roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert rois.dim() == 2 and rois.size(1) == 5 /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/modeling/roi_heads/fast_rcnn.py:138: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not valid_mask.all(): /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if tensor.numel() == 0: /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size() /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!" /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results). h, w = box_size /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/modeling/roi_heads/fast_rcnn.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if num_bbox_reg_classes == 1: /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/layers/nms.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.shape[-1] == 4 /home2/ros/venv/detectron2/lib/python3.8/site-packages/torch/init.py:1404: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert condition, message /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/layers/roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert rois.dim() == 2 and rois.size(1) == 5 /home2/ros/venv/detectron2/lib/python3.8/site-packages/detectron2/modeling/roi_heads/mask_head.py:139: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if cls_agnostic_mask: /home2/ros/venv/detectron2/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:5856: UserWarning: Exporting aten::index operator of advanced indexing in opset 17 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn(