open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.14k stars 1.22k forks source link

Spatio-Temporal action detection models to onnx #1683

Open abdulazizab2 opened 2 years ago

abdulazizab2 commented 2 years ago

Hello,

I understand that spatio-temporal action detection models cannot be converted to onnx due to RoIHead and RoIAlign layers. 1- Is there a workaround to this? 2- Since the action detection is composed of two stages, a person detection then an action detection. Can the person detection (faster R-CNN or YOLO) be exported to ONNX while keeping the second stage as is?

I am trying to gain an increase in computational speed by converting to onnx -> TensorRT

hukkai commented 2 years ago

1 Is there a workaround to this: You can convert the backbone network of the action detection model to onnx and TRT. 2 Yes it is possible. In fact the two stages are relatively separate. What the second stage needs from the first stage can be just an npy file (the human detection proposals)

abdulazizab2 commented 2 years ago

Thanks! So to summarize it, the workaround is by converting the backbone (say slowfast) + the human detector to ONNX and TRT. The spatiotemporal one is the bottleneck which might need certain plugins and customization. Is that correct?

Cuzny commented 2 years ago

Could you please tell me how to convert slowfast to ONNX? I have met the problem: TypeError: _bbox_forward() missing 1 required positional argument: 'img_metas'

hukkai commented 2 years ago

@Cuzny you want to convert slowfast backbone to onnx or the entire action detection model?

Cuzny commented 2 years ago

I want to convert the detection model, use this : python tools/deployment/pytorch2onnx.py ./configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py ./slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217-ae225e97.pth --shape 1 3 32 256 256 --output-file slowfast.onnx

hukkai commented 2 years ago

@Cuzny As far as I know, directly converting the detection model is not supported due to RoI issues, but will supposed soon.

Cuzny commented 2 years ago

@hukkai Okay I know it is impossible. What should I do if I only want to convert the backbone. The backbone does not have the attr forward_dummy .

abdulazizab2 commented 2 years ago

@hukkai Okay I know it is impossible. What should I do if I only want to convert the backbone. The backbone does not have the attr forward_dummy .

Same thing for me. If you figured it out please illustrate it.

1005452649 commented 1 year ago

What is the current situation of this problem?

@Cuzny As far as I know, directly converting the detection model is not supported due to RoI issues, but will supposed soon.

What is the current situation of this problem?