Closed Bin-ze closed 2 years ago
Is this mmdetection fault or undersampling problem? I mean maybe you do not have (less) training data that has car on the right-side of the images. Did you use any data augmentation process? Do you observe the same problem with the training images?
Because I use centernet for training, I have added a lot of data enhancements, including random inversion, etc., but the problem still happens. I don't visualize the training set data, but it is foreseeable that there should be the same problem. What's more strange is that I flip the picture and send it to the network. Some targets on the right have been detected. I feel very strange. How to solve it?
發自我的iPhone
在 2022年1月4日,下午9:28,Tolga @.***>
İf you do everything right, I do not know how to solve this problem. The only thing I ask that if you have another image that looks like this image or has cars on the right side of the image in the training set, can you evaluate this image with your trained model in order to see the predictions whether it has the same problem or not?
This is a video sequence data set. I used a random division method to divide the training set and the test set. Therefore, the images of the training set and the test set are very similar, and the same problem exists in the training set. I visualized the GT box, and there is no problem. The image annotations seem to be normal. I now suspect that it is the reason for the data enhancement. The data enhancement configuration file I used is exactly the same as the centernet practical configuration reproduced by mmdetection, as shown below:
base = [ '../base/datasets/UAV_coco.py', '../base/schedules/schedule_1x.py', '../base/default_runtime.py' ]
model = dict( type='CenterNet_v1', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='CTResNetNeck', in_channel=2048, num_deconv_filters=(256,128,64), num_deconv_kernels=(4, 4, 4), use_dcn=True), bbox_head=dict( type='_CenterNetHead', num_classes=3, in_channel=64, feat_channel=64, loss_center_heatmap=dict(type='GaussianFocalLoss', loss_weight=1.0), loss_wh=dict(type='L1Loss', loss_weight=0.1), loss_offset=dict(type='L1Loss', loss_weight=1.0)), train_cfg=None, test_cfg=dict(topk=100, local_maximum_kernel=3, max_per_img=100)) img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [ dict(type='LoadImageFromFile', to_float32=True, color_type='color'), dict(type='LoadAnnotations', with_bbox=True), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='RandomCenterCropPad', crop_size=(640, 640), ratios=(0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3), mean=[0, 0, 0], std=[1, 1, 1], to_rgb=True, test_pad_mode=None), dict(type='Resize', img_scale=(512, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]
test_pipeline = [ dict(type='LoadImageFromFile', to_float32=True), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict( type='RandomCenterCropPad', ratios=None, border=None, mean=[0, 0, 0], std=[1, 1, 1], to_rgb=True, test_mode=True, test_pad_mode=['logical_or', 31], test_pad_add_pix=1), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='DefaultFormatBundle'), dict( type='Collect', meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg', 'border'), keys=['img']) ]) ]
data_root='../data/UAV/' dataset_type = 'UAVDataset' data = dict( samples_per_gpu=16, workers_per_gpu=4, train=dict( delete=True, type='RepeatDataset', times=3, dataset=dict( type=dataset_type, ann_file=dataroot + 'train-5000.json', img_prefix=data_root + 'UAV/', pipeline=train_pipeline)), val=dict( type=dataset_type, ann_file=data_root + 'val_5000.json', img_prefix=data_root + 'UAV/', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'val_5000.json', img_prefix=data_root + 'UAV/', pipeline=test_pipeline))
optimizer = dict(type='SGD', lr=0.002, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(delete=True, grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict( policy='step', warmup='linear', warmup_iters=1000, warmup_ratio=1.0 / 1000, step=[12, 16]) # the real step is [185, 245] runner = dict(max_epochs=20) # the real epoch is 28*5=140 evaluation = dict(interval=1, metric='bbox') checkpoint_config = dict(interval=2)
**But it is worth noting that the image resolution of this data set is the same, 1024x540, and most of the images in the data set have a large number of annotations, some even up to hundreds, and some are just a few. The data is being enhanced. I used RandomCenterCropPad with a cropped resolution of 512x512. I was thinking about whether there was a problem here. It is difficult for me to understand the root cause of the problem. I visualized the predicted feature map, and did not find that the feature value of the image on the right was Suppressed, but there was no test result at all.
When sampling, convolution is performed, and the convolution strokes every position of the image. This problem should not occur in theory, unless the convolution kernel does not extract features from the right side, but the result of flipping the image will get better. Even the opposite result appears. Some images on the left cannot be detected. Does this mean overfitting? I feel very strange. Such a result is something I have never encountered before. I did not change any settings of the centernet, but used other data. I am sure that my operation is normal because I have done a lot of research on mmdetection before。 The visualization results of the predicted feature map and the detection results are as follows:**
It is difficult for me to understand what is the reason for such a problem, and from what angle to solve it, I hope to get your reply as soon as possible**
Send the image directly to the detector, the results are as follows:
When I flip the image and send it to the detector, the result is as follows: horizontal flip
Flip vertically
I think this result is very strange. Yuan Fang, how do you see?
Same here when i was training fcos. Only targets on the left were detected.
I solved this problem later. Although it is a long time ago, I suggest you check the data augmentation method, and whether the width and height labels of the dataset are reversed (very important), and try again later
I solved this problem later. Although it is a long time ago, I suggest you check the data augmentation method, and whether the width and height labels of the dataset are reversed (very important), and try again later
Could you explain the solution in detail? Although it has been a long time, I have encountered the same problem! Sincerely hope to get a reply! Thx.
Same here when i was training fcos. Only targets on the left were detected.
Could you explain the solution in detail? Although it has been a long time, I have encountered the same problem in FCOS! Sincerely hope to get a reply! Thx.
I used centernet to train the UAV data set, but found that half of a picture could not detect the target, and the other half of the picture could be detected very well, even though there is almost no difference in the characteristics of the two targets. I checked the code carefully, but I didn’t find the problem. How can I think about solving such a problem? The picture of the test results is as follows, almost all test results have such problems:
Can anyone help me?