Inference from jywang's checkpoint

Mahad-M commented 4 years ago

I am trying to run detection from a model that I trained with jwyang's repository but now I need to run it on CPU which that respo does not provide. I have changed the anchor sizes and anchor scales according to the one that I trained on but I am still getting mismatch errors.

Called with args: Namespace(add_params=[], class_agnostic=False, cuda=False, dataset='voc_2007_trainval', epoch=50, image_dir='images/', load_dir='models', mGPU=False, mode='detect', net='resnet101', session=1, vis=True) Current device: CPU Using config: GENERAL: {'MAX_IMG_RATIO': 2.0, 'MAX_IMG_SIZE': 1000, 'MIN_IMG_RATIO': 0.5, 'MIN_IMG_SIZE': 600, 'POOLING_MODE': 'pool', 'POOLING_SIZE': 7} TEST: {'NMS': 0.3, 'RPN_NMS_THRESHOLD': 0.7, 'RPN_POST_NMS_TOP': 300, 'RPN_PRE_NMS_TOP': 6000} RPN: {'ANCHOR_SCALES': [2, 4, 8, 16, 32], 'ANCHOR_RATIOS': [0.5, 1, 2, 4, 8], 'FEATURE_STRIDE': 16} /home/mahad/frcnn_cpu2/faster-rcnn-pytorch/data/images/ Loading classes for image dataset... WARNING! Cannot find "devkit_path" in additional parameters. Try to use default path (./data/VOCdevkit)... Used image config: {'color_mode': 'BGR', 'range': 255, 'mean': [102.9801, 115.9465, 122.7717], 'std': [1.0, 1.0, 1.0]} Loaded classes for PascalVoc 2007 trainval dataset. Loading image dataset... Used image config: {'color_mode': 'BGR', 'range': 255, 'mean': [102.9801, 115.9465, 122.7717], 'std': [1.0, 1.0, 1.0]} Loaded Detection dataset. Preparing image data...

Done. Output directory: /home/mahad/frcnn_cpu2/faster-rcnn-pytorch/data/images/result Loading model from /home/mahad/frcnn_cpu2/faster-rcnn-pytorch/data/models/resnet101/voc_2007/frcnn_1_50.pth Traceback (most recent call last): File "run.py", line 147, in detect(dataset=args.dataset, net=args.net, class_agnostic=args.class_agnostic, File "/home/mahad/frcnn_cpu2/faster-rcnn-pytorch/script/detect.py", line 67, in detect faster_rcnn.load_state_dict(checkpoint['model']) File "/home/mahad/anaconda3/envs/dl38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 829, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Resnet: size mismatch for RCNN_rpn.RPN_Conv.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for RCNN_rpn.RPN_Conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for RCNN_rpn.RPN_cls_score.weight: copying a param with shape torch.Size([50, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([50, 1024, 1, 1]). size mismatch for RCNN_rpn.RPN_bbox_pred.weight: copying a param with shape torch.Size([100, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([100, 1024, 1, 1]). size mismatch for RCNN_cls_score.weight: copying a param with shape torch.Size([2, 2048]) from checkpoint, the shape in current model is torch.Size([21, 2048]). size mismatch for RCNN_cls_score.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([21]). size mismatch for RCNN_bbox_pred.weight: copying a param with shape torch.Size([8, 2048]) from checkpoint, the shape in current model is torch.Size([84, 2048]). size mismatch for RCNN_bbox_pred.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([84]).

Do I have to train again with this respository or is it possible to remove these errors? Thanks.

loolzaaa commented 4 years ago

You can't remove this errors, because:

You try to load trained checkpoint with 2 class (RCNN_cls_score.weight ---> torch.Size([2, 2048])) in 21 class model;
This repo used torchvision resnet50 models, so it has size = 1024 on layer3 output (RCNN_Base = RPN = out_depth) link. I don't understand why in your checkpoint there is 512-d dimension.

Are you sure, that you trained Resnet50 before, not VGG16?

Mahad-M commented 4 years ago

It is a resnet101 checkpoint that I have trained and yes there are two classes in my dataset.

loolzaaa commented 4 years ago

Oh, i see. Look here - this is my implementation of RPN network, and here Jwyang's implementation. He used pretrained model depth only for input channels of RPN conv layer. I use it for both - input and output channels.

You can try to change this:

self.RPN_Conv = nn.Conv2d(in_depth, in_depth, 3, 1, 1, bias=True)
...
self.RPN_cls_score = nn.Conv2d(in_depth, self.nc_score_out, 1, 1, 0)
...
self.RPN_bbox_pred = nn.Conv2d(in_depth, self.nc_bbox_out, 1, 1, 0)

to this:

self.RPN_Conv = nn.Conv2d(in_depth, 512, 3, 1, 1, bias=True)
...
self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0)
...
self.RPN_bbox_pred = nn.Conv2d(512, self.nc_bbox_out, 1, 1, 0)

So, what about two classes of your dataset. You need to add it to library and script create final layers for two classes.

After that, you can use your checkpoint.

Mahad-M commented 4 years ago

Thanks a lot bud! Got it running :)

Mahad-M commented 4 years ago

I have the model up and running but when I inference from jwyang's code, I get good results but when I use this repo on the CPU, The results are not the same. The score is very low and looks like the model has not been trained properly. I am attaching both the outputs The one with green boxes is the output of this repo while red boxes are from jwyang's inference. The model is same and the threshold is also same. Any help would be much appreciated. 82092117_det jwyang

loolzaaa commented 4 years ago

Looks like CPU NMS error. Are you try inference this checkpoint in CUDA mode of this repo?

loolzaaa / faster-rcnn-pytorch

Inference from jywang's checkpoint #5