luminxu / ViPNAS

The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.
MIT License
47 stars 11 forks source link

ONNX transfer problem #12

Closed zeckireck closed 2 years ago

zeckireck commented 2 years ago

Hello, our script does not support the case that convtranpose group is greater than 1. How can we turn it into the case that each group is equal to 1?

luminxu commented 2 years ago

If you do not want to use deconvolutional layers with the group number greater than 1, you may try the backbone searched by ViPNAS (e.g. ViPNAS_ResNet following s_vipnas_res50_coco_256x192.py), and apply the TopdownHeatmapSimpleHead as the keypoint_head following SimpleBaseline. Make sure that the in_channels of keypoint_head should be consistent with the output_channels of backbone (e.g. 608 for S-ViPNAS-Res50).

luminxu commented 2 years ago

Also, if you still want to use ViPNASHeatmapSimpleHead, you can set num_deconv_groups to be (1, 1, 1). No matter which one you choose, please retrain the new model and verify whether the performance satisfies your needs.

zeckireck commented 2 years ago

are these factors (num_deconv_layers, num_deconv_filters, num_deconv_kernels) need to be modified?

zeckireck commented 2 years ago

or only change num_deconv_groups to be (1, 1, 1)

luminxu commented 2 years ago

You may try to change num_deconv_groups only. As it is no longer the keypoint head searched by ViPNAS, you should verify the accuracy under this setting. If necessary, you can also change other arguments.

zeckireck commented 2 years ago

Why are there 2 inferences here? Just to infer for once, can't it? Will there be a big impact on accuracy and confidence?https://github.com/luminxu/ViPNAS/blob/56a0630efee9d36595c0f5d3268553f273d35ad3/mmpose/models/detectors/top_down.py#L184

luminxu commented 2 years ago

Flip Test is a common test-time technique for more robust prediction, which empirically introduces around 1% AP improvement. If you care more about the inference speed, you can turn it off in the configuration.

zeckireck commented 2 years ago

As for resnet50, do you know where this operator is in the code? Our hardware only supports the case where the divisor of the div operator is constant。 image image

luminxu commented 2 years ago

I cannot recognize the exact operators according to the figures. Also, I am not sure when the problem happens, e.g., instantiation (I guess so), initialization, or inference. According to your description, you may check this line.

zeckireck commented 2 years ago

Is there any way to set out_channels to a fixed value?https://github.com/luminxu/ViPNAS/blob/main/mmpose/models/backbones/vipnas_resnet.py#L115

zeckireck commented 2 years ago

What is this attention used for?https://github.com/luminxu/ViPNAS/blob/main/mmpose/models/backbones/vipnas_resnet.py#L115is it necessary?

luminxu commented 2 years ago

Whether to use attention module is included in search spaces and is searched for better pose estimation performance. Please refer to our paper for more details.

zeckireck commented 2 years ago

https://github.com/luminxu/ViPNAS/blob/main/mmpose/models/backbones/vipnas_resnet.py#L115. Why do we need 16/outchannels here? only to use 1/16,cant it?