The official implementation for ICCV'23 paper "Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning"
[Reimplementation] #5

Closed chnu-cpl closed 9 months ago

chnu-cpl commented 9 months ago


💬 Describe the reimplementation questions

I have a problem when I try to implement the ROI part of your model and the feature imitation branch. The following is the problem that occurs when I embed in your model into my model. Please suggest any solution to this problem.


sys.platform: linux Python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr NVCC: Cuda compilation tools, release 11.5, V11.5.119 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.10.0+cu113 PyTorch compiling details: PyTorch built with:

TorchVision: 0.11.1+cu113 OpenCV: 4.7.0 MMCV: 1.6.1 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.13.0+

2023-09-18 08:53:58,694 - mmdet - INFO - Distributed training: False 2023-09-18 08:53:58,828 - mmdet - INFO - Config: dataset_type = 'AITODDataset' data_root = '/home/cpl/dataset/AI-TOD/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(800, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='AITODDataset', ann_file='/home/cpl/dataset/AI-TOD/annotations/aitod_trainval_v1.json', img_prefix='/home/cpl/dataset/AI-TOD/trainval/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(800, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='AITODDataset', ann_file='/home/cpl/dataset/AI-TOD/annotations/aitod_test_v1.json', img_prefix='/home/cpl/dataset/AI-TOD/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='AITODDataset', ann_file='/home/cpl/dataset/AI-TOD/annotations/aitod_test_v1.json', img_prefix='/home/cpl/dataset/AI-TOD/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=12, metric='bbox') optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=5000, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=4) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) model = dict( type='FasterRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='FIRoIHead', num_gpus=1, temperature=0.6, contrast_loss_weights=0.5, num_con_queue=256, con_sampler_cfg=dict(num=128, pos_fraction=[0.5, 0.25, 0.125]), con_queue_dir='./work_dirs/roi_feats/cfinet', ins_quality_assess_cfg=dict( cls_score=0.05, hq_score=0.65, lq_score=0.25, hq_pro_counts_thr=8), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=8, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1, gpu_assign_thr=512), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=3000, max_per_img=3000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1, gpu_assign_thr=512), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=3000, max_per_img=3000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=3000))) work_dir = 'work-dir/debug' gpu_ids = range(0, 1)

2023-09-18 08:54:05,169 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs 2023-09-18 08:54:05,169 - mmdet - INFO - Checkpoints will be saved to /home/cpl/object_detection/mmdet-rfla/work-dir/debug by HardDiskBackend. /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/core/anchor/ UserWarning: grid_anchors would be deprecated soon. Please use grid_priors warnings.warn('grid_anchors would be deprecated soon. ' /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/core/anchor/ UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors 'single_level_grid_anchors would be deprecated soon. Traceback (most recent call last): File "tools/", line 188, in main() File "tools/", line 184, in main meta=meta) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/apis/", line 170, in train_detector, cfg.workflow) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmcv/runner/", line 136, in run epoch_runner(data_loaders[i], kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmcv/runner/", line 53, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmcv/runner/", line 32, in run_iter kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmcv/parallel/", line 77, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/models/detectors/", line 237, in train_step losses = self(data) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/nn/modules/", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmcv/runner/", line 116, in new_func return old_func(*args, kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/models/detectors/", line 171, in forward return self.forward_train(img, img_metas, kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/models/detectors/", line 148, in forward_train *kwargs) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/models/roi_heads/", line 210, in forward_train gt_bboxes, gt_labels, img_metas) File "/home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/mmdet/models/roi_heads/", line 409, in _bbox_forward_train iq_loss_weights[iq_signs == j] = weight RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7ff091039d62 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/ frame #1: + 0x1c5f3 (0x7ff09141c5f3 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/ frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x1a2 (0x7ff09141d002 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/ frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7ff091023314 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/ frame #4: + 0x29a829 (0x7fefe7c9a829 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/ frame #5: + 0xae89a9 (0x7fefe84e89a9 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/ frame #6: THPVariable_subclass_dealloc(_object) + 0x2b9 (0x7fefe84e8cc9 in /home/cpl/anaconda3/envs/cpl/lib/python3.7/site-packages/torch/lib/

frame #24: + 0x29d90 (0x7ff0b0629d90 in /lib/x86_64-linux-gnu/ frame #25: __libc_start_main + 0x80 (0x7ff0b0629e40 in /lib/x86_64-linux-gnu/ 已放弃 (核心已转储) ### Expected results _No response_ ### Additional information _No response_
chnu-cpl commented 9 months ago

Based on this tweet, when I added os.environ[' CUDA_LAUNCH_BLOCKING '] = "1" under import os in, the following error message was displayed. image

shaunyuan22 commented 9 months ago

empirically, this error is triggered by the index error, and two potential reasons for this based on your error meessage:

  1. iq_signs == j index exceeds the length of iq_loss_weights;
  2. the size of anchor_feats and contrast_feats do not match after the projection operation by fc_proj
chnu-cpl commented 9 months ago

empirically, this error is triggered by the index error, and two potential reasons for this based on your error meessage:

  1. iq_signs == j index exceeds the length of iq_loss_weights;
  2. the size of anchor_feats and contrast_feats do not match after the projection operation by fc_proj

Thank you for your response. Based on your suggestions, I have conducted separate investigations into the two potential risks mentioned above. If my investigative approach is correct, there doesn't seem to be any issues, and the results are displayed below. Additionally, I have not made any changes to the entire file. What could be the possible problems in this case?



shaunyuan22 commented 9 months ago

may i ask is this error occurring at the beginning of the training or after a period of training? and dose the error still occurs for these two size output cause their sizes match perfectly and it shouldn't trigger error.

chnu-cpl commented 9 months ago

may i ask is this error occurring at the beginning of the training or after a period of training? and dose the error still occurs for these two size output cause their sizes match perfectly and it shouldn't trigger error.

During the debugging mentioned above, the error occurred right at the beginning of training. After I switched to a different dataset, the error occurred after training for 50 rounds sometimes, and in other cases, after 100 rounds. image

chnu-cpl commented 9 months ago

may i ask is this error occurring at the beginning of the training or after a period of training? and dose the error still occurs for these two size output cause their sizes match perfectly and it shouldn't trigger error.

Hello, additionally, it seems I have found an error. In the file, in lines 403 and 404, the comments suggest that "anchor_feature" should be (num_gts, 256, 1, 1), and "contrast_feature" should be (num_gts, self.con_sample_num, 256, 1, 1). I'm not sure if this is related to the error I mentioned earlier.


shaunyuan22 commented 9 months ago

hey, your are absolutely right and the mismatch between the sizes of anchor_feats and that of constrast_feats triggers the error. moreover, i've download the AITOD dataset and find that the annotations are dense and there may exist more than 300 instances in a single image, which largely exceeds the positive number of bbox_sampler, which is the root cause for the error earlier. we will update our code for this dense condition as soon as possible.

shaunyuan22 commented 9 months ago

The code has been updated.

chnu-cpl commented 9 months ago

The code has been updated.

Wow, thank you for helping me resolve this issue. My model is running smoothly now. However, I have one more question: what is the loss_con that gets printed, and why is it always 0? Is there something wrong with my setup? image

shaunyuan22 commented 9 months ago

glad that helps you :) the term loss_con is the the loss of feature imitation head, and it goes zero since no exemplar features stored at the begining of thraining, hence it is very normal. and empiracally, afetr about 2000 iterations, this term will be non-zero value which means the feature imitation head participates the overall optimization.

chnu-cpl commented 9 months ago

Thank you again for your response, and I wish you all the best.