voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
837 stars 137 forks source link

Error when converting the LaneATT pt model to onnx model #100

Open sjtuljw520 opened 2 years ago

sjtuljw520 commented 2 years ago

When I run this command: python tools/to_onnx.py --config=configs/lane_detection/laneatt/resnet34_culane.py --height=360 --width=640 --checkpoint=model/resnet34_laneatt_culane_20220225.pt

Got the error message like this: Traceback (most recent call last): File "/home/liujianwei/project/code/pytorch-auto-drive-new/tools/to_onnx.py", line 70, in pt_to_onnx(net, dummy, onnx_filename, opset_version=op_v) File "/home/liujianwei/project/code/pytorch-auto-drive-new/utils/onnx_utils.py", line 55, in pt_to_onnx torch.onnx.export(net, dummy, filename, verbose=True, input_names=['input1'], output_names=temp.keys(), File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/init.py", line 316, in export return utils.export(model, args, f, export_params, verbose, training, File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export _model_to_graph(model, args, verbose, input_names, File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph graph = _optimize_graph(graph, operator_export_type, File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph graph = torch._C._jit_pass_onnx(graph, operator_export_type) File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/init.py", line 373, in _run_symbolic_function return utils._run_symbolic_function(*args, *kwargs) File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function return symbolic_fn(g, inputs, **attrs) File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 483, in expand_as return g.op("Expand", self, shape) File "/home/liujianwei/.conda/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py", line 928, in _graph_op torch._C._jit_pass_onnx_node_shape_type_inference(n, _params_dict, opset_version) RuntimeError: input_shape_value == reshape_value || input_shape_value == 1 || reshape_value == 1INTERNAL ASSERT FAILED at "../torch/csrc/jit/passes/onnx/shape_type_inference.cpp":547, please report a bug to PyTorch. ONNX Expand input shape constraint not satisfied.

voldemortX commented 2 years ago

@sjtuljw520 I will check if this is supported, but perhaps tomorrow, I don't have the test environment now.

voldemortX commented 2 years ago

@sjtuljw520 The development of LaneATT seems to have deviated from the main branch for a long time (quote @cedricgsh), so it was untested on conversions. I will mark this as a feature request for now.

Theoretically speaking, LaneATT should support conversions since there are no special ops.

sjtuljw520 commented 2 years ago

Many thanks for replying. Looking forward to this feature can be added. @voldemortX

voldemortX commented 2 years ago

@sjtuljw520 if you download anchors from LaneATT repo and use opset 11, torch 1.8 (and corresponding mmcv), the model can be converted to onnx. But not trt yet. I'm working on that. However, the nms op is not included in the conversion. If you use this model in embedded device, you will have to implement nms yourself.

sjtuljw520 commented 2 years ago

So if the NMS op is included in the model, the conversion can not work and will enconter error? @voldemortX

voldemortX commented 2 years ago

So if the NMS op is included in the model, the conversion can not work and will enconter error? @voldemortX

Yes. It is a customized cuda kernel, which is not supported in the pytorch onnx convertor. You would need a customized onnx implementation for it. This kind of support is sophisticated and haven't been introduced into this framework (I currently don't know how to do it). And there is no reference for line nms onnx conversion out there (that I am aware of).

voldemortX commented 2 years ago

But it looks like an op that is suitable to be included in customized post-processing impl, such as a customized SDK function or something.

sjtuljw520 commented 2 years ago

Thank you! I got it. @voldemortX

voldemortX commented 2 years ago

@sjtuljw520 With 4419bba & #102, you should be able to convert LaneATT to ONNX and TensorRT (check the new doc, you will need pytorch 1.8.0 & nvidia-tensorrt 8.4.1.5).

Except for the nms post-processing as discussed earlier.

sjtuljw520 commented 2 years ago

Nice work! 👍 @voldemortX

YoohJH commented 1 year ago

Really appreciate the amazing work, thanks!

File "xxxxxxx/pytorch-auto-drive/utils/models/lane_detection/laneatt.py", line 169, in cut_anchor_features
    rois = img_features[self.cut_zs, self.cut_ys, self.cut_xs].view(n_proposals, n_fmaps, self.featmap_h, 1)
IndexError: index 9 is out of bounds for dimension 1 with size 9

After downloaded culane_anchors_freq.pt from https://github.com/lucastabelini/LaneATT/raw/main/data/culane_anchors_freq.pt, i got the error above.

using pytorch==1.13 onnx==1.12.0 onnxruntime==1.13.1.

voldemortX commented 1 year ago

@YoohJH Are you also trying to convert LaneATT to onnx? Maybe check the input image height & width first, do they correspond with the CULane 288x800 setting.

YoohJH commented 1 year ago

Thank you for the answer! The error happend when I try to convert resnet18_laneatt_culane_20220320.pt to .onnx.

The command I used:

python tools/to_onnx.py --config=configs/lane_detection/laneatt/resnet18_culane.py --height=288 --width=800 --checkpoint=resnet18_laneatt_culane_20220320.pt
voldemortX commented 1 year ago

Thank you for the answer! The error happend when I try to convert resnet18_laneatt_culane_20220320.pt to .onnx.

The command I used:

python tools/to_onnx.py --config=configs/lane_detection/laneatt/resnet18_culane.py --height=288 --width=800 --checkpoint=resnet18_laneatt_culane_20220320.pt

Thanks for the info, I will try to reproduce this error later today.

voldemortX commented 1 year ago

@YoohJH Can't get to my machine right now, could you try --height=360 --width=640 ?

YoohJH commented 1 year ago

@YoohJH Can't get to my machine right now, could you try --height=360 --width=640 ?

It has been determined that .onnx can be exported without any error using:

python tools/to_onnx.py --config=configs/lane_detection/laneatt/resnet18_culane.py --height=360 --width=640 --checkpoint=resnet18_laneatt_culane_20220320.pt

Maybe the 'resnet18_laneatt_culane_20220320.pt' was named wrong? But there's no laneatt-tusimple in the model_zoo.

voldemortX commented 1 year ago

@YoohJH Can't get to my machine right now, could you try --height=360 --width=640 ?

It has been determined that .onnx can be exported without any error using:

python tools/to_onnx.py --config=configs/lane_detection/laneatt/resnet18_culane.py --height=360 --width=640 --checkpoint=resnet18_laneatt_culane_20220320.pt

Maybe the 'resnet18_laneatt_culane_20220320.pt' was named wrong? But there's no laneatt-tusimple in the model_zoo.

It seems the default setting for laneatt is 360p in all datasets...

YoohJH commented 1 year ago

My mistake! Thanks for the patience!