Closed cefengxu closed 4 years ago
Hi I have tested on your enviroment in last issue. And the error did exists. Thanks for the bug report, I will fix it this weekend. Please allow me some time.
By the way, can you share the dockerfile that you used?
Hi I have tested on your environment in the last issue. And the error did exist. Thanks for the bug report, I will fix it this weekend. Please allow me some time.
By the way, can you share the dockerfile that you used?
sorry, for the last issue, I run on my local computer with conda instead of docker.
Hi Guess I have fix the bug. The error is caused by uncontiguous input tensor. It is a good point to accelerate pytorch, but failed on my repo. I have update the torch2trt/amirstan_plugin and this repo. please pull the latest repo and try again, see if it works.
cool~, I will try ASAP
Hi Guess I have fix the bug. The error is caused by uncontiguous input tensor. It is a good point to accelerate pytorch, but failed on my repo. I have update the torch2trt/amirstan_plugin and this repo. please pull the latest repo and try again, see if it works.
hi, I pull the lastest repo of amirstan_pluing, torch2trt_dynamic, mmdetection-to-tensort )and rebuild, reinstall. But, the output of inference.py is the same as before, all result is zero.
BTW, I use torch1.5.0 and torchvison0.6.0 and there is a bug shown below in delta_xywh_bbox_coder.py
File "/home/cefengxu/pyProjects/grimoire/mmdetection-to-tensorrt/mmdet2trt/core/bbox/coder/delta_xywh_bbox_coder.py", line 34, in delta2bbox_custom_func
scores = scores.view(1,-1, num_classes)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
so , I change the code scores.view(1,-1,numclasses) to sores.contiguous().view(1,-1,num_class) to fix this case.
hi I have tested on pytorch1.5 and 1.6. everything is fine besides the uncontiguous bug you mentioned. (on ubuntu18.04)
As long as you change the pytorch version,could you provide the enviroment detail again?
here is the model I created on 2080ti, please try inference on it. https://drive.google.com/file/d/16bKyAj4bWgcem6iatMLf6etUJL83Tm_i/view?usp=sharing
And If you are using 2080ti, would you mind share the created engine(1.trt in your case) with me?
hi, compute of the model ( 7.5 ) you support higher than mine ( 6.5 ), so I can not load it for testing. the model I build from my env as below but I do not 2080ti:
https://drive.google.com/file/d/1hLjhzB-J0aLpbPPgWdQTsx1vH7D374Xl/view?usp=sharing
on the other way, maybe I can debug via IDE. or which info you want to take a look at, I can running the inference.py and print them using my enviroment for you
Ok, I will check the model you provided later.
Could you check the value and shape of the input tensor in inference.py? It is contiguous? you can also add return_warp_model to mmdet2trt like below:
trt_model, torch_model = mmdet2trt(cfg_path, model_path, opt_shape_param=opt_shape_param, fp16_mode=False, max_workspace_size=1<<30, log_level=logging.INFO, return_warp_model=True)
this will give you both tensorrt model and pytorch model(warp of the mmdetection detector). try return some Intermediate results in forward() of mmdet2trt/models/detectors/two_stage.py mmdet2trt/models/dense_heads/rpn_head.py and mmdet2trt/models/roi_heads/standard_roi_head.py . See if there are large gap between the result of tensorrt model and pytorch model.
for example:
def forward(self, x):
model = self.model
rpn_head = self.rpn_head_warper
# backbone
feat = model.extract_feat(x)
return feat
rois = rpn_head(feat, x)
result = self.roi_head_warper(feat, rois)
return result
force the model return the feature of backbone. Check the return value of both trt_model and torch_model.
@grimoire I following your step and output feat, ROI via trt_model and torch_model respectively.
feat output from each other is the same but ROI is different.
my test code as follows:
opt_shape_param=[
[
[1,3,320,320], # min shape
[1,3,1280,1280], # optimize shape
[1,3,1344,1344], # max shape
]
]
max_workspace_size=1<<30 # some module need large workspace, add workspace size when OOM.
trt_model, torch_model = mmdet2trt(cfg_path, weight_path , opt_shape_param=opt_shape_param, fp16_mode=False, max_workspace_size=1<<30, log_level=logging.debug, return_warp_model=True)
x = torch.ones([1,3,320,320])
x = x.cuda()
y1 = trt_model(x)
y2 = torch_model(x)
output from trt_model
tensor([[ 0.0000, 0.0000, 320.0000, 48.9743],
[ 0.0000, 0.0000, 320.0000, 73.2971],
[ 0.0000, 0.0000, 199.6483, 35.7175],
...,
[ 0.0000, 168.7068, 30.9094, 192.4857],
[ 0.0000, 136.8175, 30.7680, 160.5707],
[307.3548, 235.0510, 320.0000, 277.6660]], device='cuda:0')
output from torch_model
tensor([[ 0.0000, 0.0000, 320.0000, 48.9742],
[ 0.0000, 0.0000, 320.0000, 73.2970],
[ 0.0000, 0.0000, 199.6483, 35.7174],
...,
[230.9487, 0.0000, 249.8271, 14.0494],
[ 0.0000, 168.7068, 30.9094, 192.4857],
[ 0.0000, 136.8175, 30.7680, 160.5708]], device='cuda:0',
grad_fn=<IndexBackward>)
Log be shown as follow:
INFO:root:load model from config:/home/cefengxu/cETOOL/mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
DEBUG:root:find module type:<class 'mmdet.models.detectors.faster_rcnn.FasterRCNN'>
DEBUG:root:find module type:<class 'mmdet.models.dense_heads.rpn_head.RPNHead'>
DEBUG:root:find module type:<class 'mmdet.core.anchor.anchor_generator.AnchorGenerator'>
DEBUG:root:find module type:<class 'mmdet.core.bbox.coder.delta_xywh_bbox_coder.DeltaXYWHBBoxCoder'>
DEBUG:root:find module type:<class 'mmdet.models.roi_heads.standard_roi_head.StandardRoIHead'>
DEBUG:root:find module type:<class 'mmdet.models.roi_heads.roi_extractors.single_level_roi_extractor.SingleRoIExtractor'>
INFO:root:model warmup
. forward @ two_stage
. forward @ rpn_head
. forward @ standard roi head
INFO:root:convert model
. forward @ two_stage
. forward @ rpn_head
. forward @ standard roi head
Warning: Encountered known unsupported method torch.Tensor.new_zeros
DEBUG:root:negative index of view/reshape might cause overflow!
DEBUG:root:negative index of view/reshape might cause overflow!
Warning: Encountered known unsupported method torch.Tensor.new_tensor
Warning: Encountered known unsupported method torch.Tensor.new_tensor
DEBUG:root:negative index of view/reshape might cause overflow!
Warning: Encountered known unsupported method torch.Tensor.new_zeros
-----y1-----
INFO:root:convert take time 34.885846853256226 s
(tensor([0], device='cuda:0', dtype=torch.int32), tensor([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
BTW, As mentioned in the previous command , the feat and ROI are no zeros, but through the self.roi_head_warper(feat, rois) in forward() on two_stage.py , the result will be all zero.
Hi. the different of output of rpn_heads is not a big deal. low score proposals might have different topk and nms result when using tensorrt. I have tested on my side, same result as you.
Seems feature extractor and rpn_head works.
Feed dummy input such as x = torch.ones([1,3,320,320])
to network will give you zero results. because in mmdetection config file, test_cfg['rcnn']['score_thr']=0.05, any predict with score lower than this will be filtered out( I set the value to zero instead of remove them to make the graph fix).
Try use real image data as input , see if pytorch model and tensorrt model can give you the right predict. Also try output Intermediate results in mmdet2trt/models/roi_heads/standard_roi_head.py see if roi_head works.
And BTW, I can't download the model you provided, could you open the access?
i tried returning the torch_model from mmdet2trt() directly
def mmdet2trt( config,
checkpoint,
device="cuda:0",
fp16_mode=False,
max_workspace_size=1<<25,
opt_shape_param=None,
log_level = logging.WARN,
return_warp_model = False):
device = torch.device(device)
logging.basicConfig(level=log_level)
logging.info("load model from config:{}".format(config))
torch_model = init_detector(config, checkpoint=checkpoint, device=device)
return torch_model # return the torch model init. via mmdet directly
and then using mmdet code belown witch can get the right predict
result = inference_detector(torch_model, img)
show_result_pyplot(torch_model, img, result, score_thr=0.3)
however, when i use the warp_model from mmdet2trt() and using
result = inference_detector(torch_model, img)
and error output as shown as follow:
INFO:root:convert take time 34.86320662498474 s
Traceback (most recent call last):
File "/home/cefengxu/pyProjects/grimoire/mmdetection-to-tensorrt/test_mmdet2trt.py", line 27, in <module>
result = inference_detector(torch_model, img)
File "/home/cefengxu/cETOOL/mmdetection/mmdet/apis/inference.py", line 86, in inference_detector
cfg = model.cfg
File "/home/cefengxu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
type(self).__name__, name))
AttributeError: 'TwoStageDetectorWarper' object has no attribute 'cfg'
errr...I mean do inference on warp_model with mmdet2trt.inference_detector(...)
, see if the result is zero or not.
The mmdetection model convert to the warp model (a pytorch model with different implementation) first, then convert to tensorrt model. I want to know whether the warp_model works as I expect, and whether the warp_model give the same ( near enough) results as the tensorrt model.
got the right result using warp_model , using demo.jpg from mmdetection , num_detections = 57 , 'chair' and 'car' so , i guess may be some thing happen in torch2trt ??
do you want to take a look at the log ( log_level=trt.Logger.VERBOSE ) when using torch2trt()??
Nope, I rarely read it.
As long as the warp_model is OK. Either the input is different or the conversion failed.
Check if the input tensor are the same, try add .contiguous()
to the input tensor.
add intermediate result in mmdet2trt/models/roi_heads/standard_roi_head.py , see if there are any gap between warp and tensorrt (with real image data)
for example:
def forward(self, feat ,proposals):
zeros = proposals.new_zeros([proposals.shape[0], 1])
rois = torch.cat([zeros, proposals], dim=1)
roi_feats = self.bbox_roi_extractor(
feat[:len(self.bbox_roi_extractor.featmap_strides)], rois)
return roi_feats # check the roi_feats
if self.shared_head is not None:
roi_feats = self.shared_head(roi_feats)
# rcnn
cls_score, bbox_pred = self.bbox_head(roi_feats)
if isinstance(cls_score, list):
cls_score = sum(cls_score) / float(len(cls_score))
scores = F.softmax(cls_score, dim=1)
bboxes = delta2bbox(proposals, bbox_pred, self.bbox_head.bbox_coder.means,
self.bbox_head.bbox_coder.stds)
num_bboxes = bboxes.shape[0]
scores = scores.unsqueeze(0)
bboxes = bboxes.view(1, num_bboxes, -1, 4)
bboxes_ext = bboxes.new_zeros((1,num_bboxes, 1, 4))
bboxes = torch.cat([bboxes, bboxes_ext], 2)
num_detections, det_boxes, det_scores, det_classes = self.rcnn_nms(scores, bboxes, num_bboxes, self.test_cfg.max_per_img)
return num_detections, det_boxes, det_scores, det_classes
Got it ~!!!
updating code in mmdet2trt.apis.inference_detector, add .contiguous() to the input tensor
# tensor = data['img'][0].unsqueeze(0).to(device)
tensor = data['img'][0].unsqueeze(0).contiguous()
tensor = tensor.to(device)
and then the warp_model and trt_model can output the same right predict ! if no add .contiguous() , just warp_model can output right predict ~!
why ????
maybe Some tensors do not occupy a whole block of memory, but are composed of different data blocks. However, the trt operation of tensors depends on the whole memory. so contiguous() have to be use ???
Bingo!
Just like what you said. Pytorch use "stride" to manage the tensor memory(you can access stride by tensor.stride() ), some operation such as permute() don't have to change the real memory block(memory copy might take a lot of time), just need to update stride. This might cause incontiguous memory. I can feed the memory block to tensorrt, but not the layerout information.
I add bindings[idx] = inputs[i].contiguous().data_ptr()
to torch2trt.torch2trt line 387 to fix this, don't know why it doesn't works on your side.
This blog detail the mechanism and implement detail http://blog.ezyang.com/2019/05/pytorch-internals/
errr... i known because the torch2trt_dynamic updated unsuccessfully, my code is torch2trt is bindings[idx] = inputs[i].data_ptr()
actually.
BTW, how to update the torch2trt_dynamic ? just run the command below again ??
sudo python setup.py install
git pull
python setup.py install
if you want to do some development on the repo, you can also use python setup.py develop
Thanks~ at least the inference.py can be running now.
HI:
after building the env on docker, I run the inference.py with :
however, the output of classification and box is incorrect as follow
The inference.py below ( i did not change anything ):