Open abdulazizab2 opened 2 years ago
1 Is there a workaround to this: You can convert the backbone network of the action detection model to onnx and TRT. 2 Yes it is possible. In fact the two stages are relatively separate. What the second stage needs from the first stage can be just an npy file (the human detection proposals)
Thanks! So to summarize it, the workaround is by converting the backbone (say slowfast) + the human detector to ONNX and TRT. The spatiotemporal one is the bottleneck which might need certain plugins and customization. Is that correct?
Could you please tell me how to convert slowfast to ONNX? I have met the problem: TypeError: _bbox_forward() missing 1 required positional argument: 'img_metas'
@Cuzny you want to convert slowfast backbone to onnx or the entire action detection model?
I want to convert the detection model, use this : python tools/deployment/pytorch2onnx.py ./configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py ./slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb_20201217-ae225e97.pth --shape 1 3 32 256 256 --output-file slowfast.onnx
@Cuzny As far as I know, directly converting the detection model is not supported due to RoI issues, but will supposed soon.
@hukkai Okay I know it is impossible. What should I do if I only want to convert the backbone. The backbone does not have the attr forward_dummy .
@hukkai Okay I know it is impossible. What should I do if I only want to convert the backbone. The backbone does not have the attr forward_dummy .
Same thing for me. If you figured it out please illustrate it.
What is the current situation of this problem?
@Cuzny As far as I know, directly converting the detection model is not supported due to RoI issues, but will supposed soon.
What is the current situation of this problem?
Hello,
I understand that spatio-temporal action detection models cannot be converted to onnx due to RoIHead and RoIAlign layers. 1- Is there a workaround to this? 2- Since the action detection is composed of two stages, a person detection then an action detection. Can the person detection (faster R-CNN or YOLO) be exported to ONNX while keeping the second stage as is?
I am trying to gain an increase in computational speed by converting to onnx -> TensorRT