grimoire / mmdetection-to-tensorrt

convert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.
Apache License 2.0
587 stars 85 forks source link

python inference.py but output [ all zeros result] #4

Closed cefengxu closed 4 years ago

cefengxu commented 4 years ago

HI:

after building the env on docker, I run the inference.py with :

demo# python inference.py ../image.jpg ../../mmdet/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py ../faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth 1.trt
INFO:root:load model from config:../../mmdet/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
INFO:root:model warmup
INFO:root:convert model
Warning: Encountered known unsupported method torch.Tensor.new_zeros
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_zeros
INFO:root:convert take time 44.59940958023071 s
TRTModule()
TRTModule()

however, the output of classification and box is incorrect as follow

trt_bbox:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], device='cuda:0')
trt_classfication_result:
tensor([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1.], device='cuda:0')

The inference.py below ( i did not change anything ):

from mmdet2trt import mmdet2trt
import torch
from argparse import ArgumentParser

from mmdet2trt.apis import inference_detector, init_detector
import cv2,logging

def main():
    parser = ArgumentParser()
    parser.add_argument('img', help='Image file')
    parser.add_argument('config', help='mmdet Config file')
    parser.add_argument('checkpoint', help='mmdet Checkpoint file')
    parser.add_argument('save_path', help='tensorrt model save path')
    parser.add_argument(
        '--device', default='cuda:0', help='Device used for inference')
    parser.add_argument(
        '--score-thr', type=float, default=0.3, help='bbox score threshold')
    parser.add_argument("--fp16", type=bool, default=False, help="enable fp16 inference")
    args = parser.parse_args()

    cfg_path = args.config

    trt_model = mmdet2trt(cfg_path, args.checkpoint,log_level = logging.INFO, fp16_mode=args.fp16, device=args.device)
    print(trt_model)
    torch.save(trt_model.state_dict(), args.save_path)

    trt_model = init_detector(args.save_path)
    print(trt_model)
    image_path = args.img

    result = inference_detector(trt_model, image_path, cfg_path, args.device)
#    print(result)
    num_detections = result[0].item()

    trt_bbox = result[1][0]
    trt_score = result[2][0]
    print('trt_bbox:')
    print(trt_bbox)
    trt_cls = result[3][0]
    print('trt_classfication_result:')
    print(trt_cls)
    image = cv2.imread(image_path)
    input_image_shape = image.shape
    for i in range(num_detections):
        scores = trt_score[i].item()
        classes = int(trt_cls[i].item())
        if scores < args.score_thr:
            continue
        bbox = tuple(trt_bbox[i])
        bbox = tuple(int(v) for v in bbox)

        color = ((classes>>2 &1) *128 + (classes>>5 &1) *128,
                (classes>>1 &1) *128 + (classes>>4 &1) *128,
                (classes>>0 &1) *128 + (classes>>3 &1) *128)
        cv2.rectangle(image, bbox[:2], bbox[2:], color, thickness=5)

    if input_image_shape[0]>1280 or input_image_shape[1]>720:
        scales = min(720/image.shape[0], 1280/image.shape[1])
        image = cv2.resize(image, (0,0), fx=scales, fy=scales)
    cv2.imwrite('image.jpg', image)

if __name__ == '__main__':
    main()
grimoire commented 4 years ago

Hi I have tested on your enviroment in last issue. And the error did exists. Thanks for the bug report, I will fix it this weekend. Please allow me some time.

By the way, can you share the dockerfile that you used?

cefengxu commented 4 years ago

Hi I have tested on your environment in the last issue. And the error did exist. Thanks for the bug report, I will fix it this weekend. Please allow me some time.

By the way, can you share the dockerfile that you used?

sorry, for the last issue, I run on my local computer with conda instead of docker.

grimoire commented 4 years ago

Hi Guess I have fix the bug. The error is caused by uncontiguous input tensor. It is a good point to accelerate pytorch, but failed on my repo. I have update the torch2trt/amirstan_plugin and this repo. please pull the latest repo and try again, see if it works.

cefengxu commented 4 years ago

cool~, I will try ASAP

cefengxu commented 4 years ago

Hi Guess I have fix the bug. The error is caused by uncontiguous input tensor. It is a good point to accelerate pytorch, but failed on my repo. I have update the torch2trt/amirstan_plugin and this repo. please pull the latest repo and try again, see if it works.

hi, I pull the lastest repo of amirstan_pluing, torch2trt_dynamic, mmdetection-to-tensort )and rebuild, reinstall. But, the output of inference.py is the same as before, all result is zero.

BTW, I use torch1.5.0 and torchvison0.6.0 and there is a bug shown below in delta_xywh_bbox_coder.py

File "/home/cefengxu/pyProjects/grimoire/mmdetection-to-tensorrt/mmdet2trt/core/bbox/coder/delta_xywh_bbox_coder.py", line 34, in delta2bbox_custom_func
    scores = scores.view(1,-1, num_classes)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

so , I change the code scores.view(1,-1,numclasses) to sores.contiguous().view(1,-1,num_class) to fix this case.

grimoire commented 4 years ago

hi I have tested on pytorch1.5 and 1.6. everything is fine besides the uncontiguous bug you mentioned. (on ubuntu18.04)

As long as you change the pytorch version,could you provide the enviroment detail again?

here is the model I created on 2080ti, please try inference on it. https://drive.google.com/file/d/16bKyAj4bWgcem6iatMLf6etUJL83Tm_i/view?usp=sharing

And If you are using 2080ti, would you mind share the created engine(1.trt in your case) with me?

cefengxu commented 4 years ago

hi, compute of the model ( 7.5 ) you support higher than mine ( 6.5 ), so I can not load it for testing. the model I build from my env as below but I do not 2080ti:

https://drive.google.com/file/d/1hLjhzB-J0aLpbPPgWdQTsx1vH7D374Xl/view?usp=sharing

on the other way, maybe I can debug via IDE. or which info you want to take a look at, I can running the inference.py and print them using my enviroment for you

grimoire commented 4 years ago

Ok, I will check the model you provided later.

Could you check the value and shape of the input tensor in inference.py? It is contiguous? you can also add return_warp_model to mmdet2trt like below:

trt_model, torch_model = mmdet2trt(cfg_path, model_path, opt_shape_param=opt_shape_param, fp16_mode=False, max_workspace_size=1<<30, log_level=logging.INFO, return_warp_model=True)

this will give you both tensorrt model and pytorch model(warp of the mmdetection detector). try return some Intermediate results in forward() of mmdet2trt/models/detectors/two_stage.py mmdet2trt/models/dense_heads/rpn_head.py and mmdet2trt/models/roi_heads/standard_roi_head.py . See if there are large gap between the result of tensorrt model and pytorch model.

for example:

    def forward(self, x):
        model = self.model
        rpn_head = self.rpn_head_warper

        # backbone
        feat = model.extract_feat(x)
        return feat
        rois = rpn_head(feat, x)

        result = self.roi_head_warper(feat, rois)
        return result

force the model return the feature of backbone. Check the return value of both trt_model and torch_model.

cefengxu commented 4 years ago

@grimoire I following your step and output feat, ROI via trt_model and torch_model respectively.

feat output from each other is the same but ROI is different.

my test code as follows:

opt_shape_param=[
    [
        [1,3,320,320], # min shape
        [1,3,1280,1280], # optimize shape
        [1,3,1344,1344], # max shape
    ]
]
max_workspace_size=1<<30 # some module need large workspace, add workspace size when OOM.
trt_model, torch_model = mmdet2trt(cfg_path, weight_path , opt_shape_param=opt_shape_param, fp16_mode=False, max_workspace_size=1<<30, log_level=logging.debug, return_warp_model=True)
x = torch.ones([1,3,320,320])
x = x.cuda()

y1 = trt_model(x)
y2 = torch_model(x)

output from trt_model

tensor([[  0.0000,   0.0000, 320.0000,  48.9743],
        [  0.0000,   0.0000, 320.0000,  73.2971],
        [  0.0000,   0.0000, 199.6483,  35.7175],
        ...,
        [  0.0000, 168.7068,  30.9094, 192.4857],
        [  0.0000, 136.8175,  30.7680, 160.5707],
        [307.3548, 235.0510, 320.0000, 277.6660]], device='cuda:0')

output from torch_model

tensor([[  0.0000,   0.0000, 320.0000,  48.9742],
        [  0.0000,   0.0000, 320.0000,  73.2970],
        [  0.0000,   0.0000, 199.6483,  35.7174],
        ...,
        [230.9487,   0.0000, 249.8271,  14.0494],
        [  0.0000, 168.7068,  30.9094, 192.4857],
        [  0.0000, 136.8175,  30.7680, 160.5708]], device='cuda:0',
       grad_fn=<IndexBackward>)

Log be shown as follow:

INFO:root:load model from config:/home/cefengxu/cETOOL/mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
DEBUG:root:find module type:<class 'mmdet.models.detectors.faster_rcnn.FasterRCNN'>
DEBUG:root:find module type:<class 'mmdet.models.dense_heads.rpn_head.RPNHead'>
DEBUG:root:find module type:<class 'mmdet.core.anchor.anchor_generator.AnchorGenerator'>
DEBUG:root:find module type:<class 'mmdet.core.bbox.coder.delta_xywh_bbox_coder.DeltaXYWHBBoxCoder'>
DEBUG:root:find module type:<class 'mmdet.models.roi_heads.standard_roi_head.StandardRoIHead'>
DEBUG:root:find module type:<class 'mmdet.models.roi_heads.roi_extractors.single_level_roi_extractor.SingleRoIExtractor'>
INFO:root:model warmup
. forward @ two_stage
. forward @ rpn_head
. forward @ standard roi head
INFO:root:convert model
. forward @ two_stage
. forward @ rpn_head
. forward @ standard roi head
Warning: Encountered known unsupported method torch.Tensor.new_zeros
DEBUG:root:negative index of view/reshape might cause overflow!
DEBUG:root:negative index of view/reshape might cause overflow!
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
DEBUG:root:negative index of view/reshape might cause overflow!
Warning: Encountered known unsupported method torch.Tensor.new_zeros
-----y1-----
INFO:root:convert take time 34.885846853256226 s
(tensor([0], device='cuda:0', dtype=torch.int32), tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.],
cefengxu commented 4 years ago

BTW, As mentioned in the previous command , the feat and ROI are no zeros, but through the self.roi_head_warper(feat, rois) in forward() on two_stage.py , the result will be all zero.

grimoire commented 4 years ago

Hi. the different of output of rpn_heads is not a big deal. low score proposals might have different topk and nms result when using tensorrt. I have tested on my side, same result as you.

Seems feature extractor and rpn_head works.

Feed dummy input such as x = torch.ones([1,3,320,320]) to network will give you zero results. because in mmdetection config file, test_cfg['rcnn']['score_thr']=0.05, any predict with score lower than this will be filtered out( I set the value to zero instead of remove them to make the graph fix).

Try use real image data as input , see if pytorch model and tensorrt model can give you the right predict. Also try output Intermediate results in mmdet2trt/models/roi_heads/standard_roi_head.py see if roi_head works.

And BTW, I can't download the model you provided, could you open the access?

cefengxu commented 4 years ago

pls, try again.

https://drive.google.com/file/d/1hLjhzB-J0aLpbPPgWdQTsx1vH7D374Xl/view?usp=sharing

cefengxu commented 4 years ago

i tried returning the torch_model from mmdet2trt() directly

def mmdet2trt(  config, 
                checkpoint,
                device="cuda:0",
                fp16_mode=False,
                max_workspace_size=1<<25,
                opt_shape_param=None,
                log_level = logging.WARN,
                return_warp_model = False):

    device = torch.device(device)

    logging.basicConfig(level=log_level)

    logging.info("load model from config:{}".format(config))
    torch_model = init_detector(config, checkpoint=checkpoint, device=device)

    return  torch_model  # return the torch model init. via mmdet directly

and then using mmdet code belown witch can get the right predict

result = inference_detector(torch_model, img)
show_result_pyplot(torch_model, img, result, score_thr=0.3)

however, when i use the warp_model from mmdet2trt() and using

result = inference_detector(torch_model, img)

and error output as shown as follow:

INFO:root:convert take time 34.86320662498474 s
Traceback (most recent call last):
  File "/home/cefengxu/pyProjects/grimoire/mmdetection-to-tensorrt/test_mmdet2trt.py", line 27, in <module>
    result = inference_detector(torch_model, img)
  File "/home/cefengxu/cETOOL/mmdetection/mmdet/apis/inference.py", line 86, in inference_detector
    cfg = model.cfg
  File "/home/cefengxu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'TwoStageDetectorWarper' object has no attribute 'cfg'
grimoire commented 4 years ago

errr...I mean do inference on warp_model with mmdet2trt.inference_detector(...) , see if the result is zero or not.

The mmdetection model convert to the warp model (a pytorch model with different implementation) first, then convert to tensorrt model. I want to know whether the warp_model works as I expect, and whether the warp_model give the same ( near enough) results as the tensorrt model.

cefengxu commented 4 years ago

got the right result using warp_model , using demo.jpg from mmdetection , num_detections = 57 , 'chair' and 'car' so , i guess may be some thing happen in torch2trt ??

do you want to take a look at the log ( log_level=trt.Logger.VERBOSE ) when using torch2trt()??

grimoire commented 4 years ago

Nope, I rarely read it.

As long as the warp_model is OK. Either the input is different or the conversion failed.

Check if the input tensor are the same, try add .contiguous() to the input tensor. add intermediate result in mmdet2trt/models/roi_heads/standard_roi_head.py , see if there are any gap between warp and tensorrt (with real image data)

for example:


    def forward(self, feat ,proposals):
        zeros = proposals.new_zeros([proposals.shape[0], 1])
        rois = torch.cat([zeros, proposals], dim=1)

        roi_feats = self.bbox_roi_extractor(
            feat[:len(self.bbox_roi_extractor.featmap_strides)], rois)
        return roi_feats   # check the roi_feats
        if self.shared_head is not None:
            roi_feats = self.shared_head(roi_feats)
        # rcnn
        cls_score, bbox_pred = self.bbox_head(roi_feats)

        if isinstance(cls_score, list):
            cls_score = sum(cls_score) / float(len(cls_score))
        scores = F.softmax(cls_score, dim=1)
        bboxes = delta2bbox(proposals, bbox_pred, self.bbox_head.bbox_coder.means,
                    self.bbox_head.bbox_coder.stds)

        num_bboxes = bboxes.shape[0]
        scores = scores.unsqueeze(0)
        bboxes = bboxes.view(1, num_bboxes, -1, 4)
        bboxes_ext = bboxes.new_zeros((1,num_bboxes, 1, 4))
        bboxes = torch.cat([bboxes, bboxes_ext], 2)
        num_detections, det_boxes, det_scores, det_classes = self.rcnn_nms(scores, bboxes, num_bboxes, self.test_cfg.max_per_img)

        return num_detections, det_boxes, det_scores, det_classes
cefengxu commented 4 years ago

Got it ~!!!

updating code in mmdet2trt.apis.inference_detector, add .contiguous() to the input tensor

# tensor = data['img'][0].unsqueeze(0).to(device)
tensor = data['img'][0].unsqueeze(0).contiguous()
tensor = tensor.to(device)

and then the warp_model and trt_model can output the same right predict ! if no add .contiguous() , just warp_model can output right predict ~!

why ????

maybe Some tensors do not occupy a whole block of memory, but are composed of different data blocks. However, the trt operation of tensors depends on the whole memory. so contiguous() have to be use ???

grimoire commented 4 years ago

Bingo!

Just like what you said. Pytorch use "stride" to manage the tensor memory(you can access stride by tensor.stride() ), some operation such as permute() don't have to change the real memory block(memory copy might take a lot of time), just need to update stride. This might cause incontiguous memory. I can feed the memory block to tensorrt, but not the layerout information.

I add bindings[idx] = inputs[i].contiguous().data_ptr() to torch2trt.torch2trt line 387 to fix this, don't know why it doesn't works on your side.

grimoire commented 4 years ago

This blog detail the mechanism and implement detail http://blog.ezyang.com/2019/05/pytorch-internals/

cefengxu commented 4 years ago

errr... i known because the torch2trt_dynamic updated unsuccessfully, my code is torch2trt is bindings[idx] = inputs[i].data_ptr() actually. BTW, how to update the torch2trt_dynamic ? just run the command below again ?? sudo python setup.py install

grimoire commented 4 years ago
git pull
python setup.py install

if you want to do some development on the repo, you can also use python setup.py develop

cefengxu commented 4 years ago

Thanks~ at least the inference.py can be running now.