mAP remained 0 across 50 epochs on deformable detr

Mohamed-Alshafai commented 1 year ago

Checklist

I have searched related issues but cannot get the expected help.
The issue has not been fixed in the latest version.

Describe the issue

I'm trying to run the deformable detr model with both refine and two stage True. I'm using the same model in the library with a custom dataset in COCO format. I cross checked the categories in my annotations files and they all have the same order. I tried running for 50 epochs straight and I kept on getting mAP of 0 for everything during validation step. "coco/bbox_mAP: 0.0000 coco/bbox_mAP_50: 0.0000 coco/bbox_mAP_75: 0.0000 coco/bbox_mAP_s: 0.0000 coco/bbox_mAP_m: 0.0000 coco/bbox_mAP_l: 0.0000"

below is my dataset configuration file

# dataset settings
dataset_type = 'CocoDataset'
data_root = "/home/wwfteam3/CombinedImages/coco/"

# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)

# data_root = 's3://openmmlab/datasets/detection/coco/'

# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6
# backend_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/': 's3://openmmlab/datasets/detection/',
#         'data/': 's3://openmmlab/datasets/detection/'
#     }))
backend_args = None

classes = ["Animals", "Rat", "Ammomanes deserti - -AD-", "Ammoperdix heyi - -AH-", "Camel", "Cat", "Dog", "Donkey", "Emberiza striolata - -ES-", "Fox", "Goat", "Oenanthe albonigra - -OA-", "Sheep", "Spilopelia senegalensis - -SS-"]

train_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', scale=(1333,800), keep_ratio=True),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PackDetInputs')
]
test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='Resize', scale=(1333,800), keep_ratio=True),
    # If you don't have a gt annotation, delete the pipeline
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]

train_dataloader = dict(
    batch_size=1,
    num_workers=1,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    batch_sampler=dict(type='AspectRatioBatchSampler'),
    dataset=dict(
        type=dataset_type,
        metainfo=dict(classes=classes),
        data_root=data_root,
        ann_file='annotations/instances_train_annotations.coco.json',
        data_prefix=dict(img='train/'),
        filter_cfg=dict(filter_empty_gt=True, min_size=32),
        pipeline=train_pipeline,
        backend_args=backend_args))   
val_dataloader = dict(
    batch_size=1,
    num_workers=1,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        metainfo=dict(classes=classes),
        data_root=data_root,
        ann_file='annotations/instances_valid_annotations.coco.json',
        data_prefix=dict(img='valid/'),
        test_mode=True,
        pipeline=test_pipeline,
        backend_args=backend_args))
test_dataloader = dict(
   batch_size=1,
   num_workers=1,
   persistent_workers=True,
   drop_last=False,
   sampler=dict(type='DefaultSampler', shuffle=False),
   dataset=dict(
       type=dataset_type,
       metainfo=dict(classes=classes),
       data_root=data_root,
       ann_file=data_root + 'annotations/instances_test_annotations.coco.json',
       data_prefix=dict(img='test/'),
       test_mode=True,
       pipeline=test_pipeline)) 

val_evaluator = dict(
    type='CocoMetric',
    ann_file=data_root + 'annotations/instances_valid_annotations.coco.json',
    metric='bbox',
    format_only=False,
    backend_args=backend_args)
test_evaluator = dict(
   type='CocoMetric',
   metric='bbox',
   format_only=True,
   ann_file=data_root + 'annotations/instances_test_annotations.coco.json',
   outfile_prefix='./work_dirs/wwf_detection/test')

#test_dataloader = val_dataloader
#test_dataloader = dict(
#    batch_size=1,
#    num_workers=2,
#    persistent_workers=True,
#    drop_last=False,
#    sampler=dict(type='DefaultSampler', shuffle=False),
#    dataset=dict(
#        type=dataset_type,
#        data_root=data_root,
#        ann_file='annotations/instances_test_annotations.coco.json',
#        data_prefix=dict(img='test/'),
#        test_mode=True,
#        pipeline=test_pipeline,
#        backend_args=backend_args))

#test_evaluator = val_evaluator
#test_evaluatir = dict(
#    type='CocoMetric',
#    ann_file=data_root + 'annotations/instances_test_annotations.coco.json',
#    metric='bbox',
#    format_only=False,
#    backend_args=backend_args)

# inference on test dataset and
# format the output results for submission.

samples from my json annotations file:

"categories": [{"id": 0, "name": "Animals", "supercategory": "none"}, {"id": 1, "name": "Rat", "supercategory": "Animals"}, {"id": 2, "name": "Ammomanes deserti - -AD-", "supercategory": "Animals"}, {"id": 3, "name": "Ammoperdix heyi - -AH-", "supercategory": "Animals"}, {"id": 4, "name": "Camel", "supercategory": "Animals"}, {"id": 5, "name": "Cat", "supercategory": "Animals"}, {"id": 6, "name": "Dog", "supercategory": "Animals"}, {"id": 7, "name": "Donkey", "supercategory": "Animals"}, {"id": 8, "name": "Emberiza striolata - -ES-", "supercategory": "Animals"}, {"id": 9, "name": "Fox", "supercategory": "Animals"}, {"id": 10, "name": "Goat", "supercategory": "Animals"}, {"id": 11, "name": "Oenanthe albonigra - -OA-", "supercategory": "Animals"}, {"id": 12, "name": "Sheep", "supercategory": "Animals"}, {"id": 13, "name": "Spilopelia senegalensis - -SS-", "supercategory": "Animals"}],
"images": [{"id": 0, "license": 1, "file_name": "GHALILAH_11060264_JPG.rf.007b383714b6414071a74a98ef39c79e.jpg", "height": 1440, "width": 1920, "date_captured": "2023-04-15T19:44:35+00:00"},..],
"annotations": [{"id": 0, "image_id": 0, "category_id": 10, "bbox": [872, 1068, 616, 274], "area": 168784, "segmentation": [], "iscrowd": 0},..]

Reproduction

What command or script did you run?

python tools/train.py "/home/wwfteam3/openmmlab/mmdetection/configs/deformable_detr/deformable-detr_r50_16xb2-50e_wwf.py"

What config dir you run?

_base_ = [
    '../_base_/datasets/wwf_detection.py', '../_base_/default_runtime.py'
] #changed to work with wwf dataset
model = dict(
    type='DeformableDETR',
    num_queries=300,
    num_feature_levels=4,
    with_box_refine=True, # edited for wwf    old -> false
    as_two_stage=True, # edited for wwf       old -> False
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        bgr_to_rgb=True,
        pad_size_divisor=1),
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        with_cp=True,
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='ChannelMapper',
        in_channels=[512, 1024, 2048],
        kernel_size=1,
        out_channels=256,
        act_cfg=None,
        norm_cfg=dict(type='GN', num_groups=32),
        num_outs=4),
    encoder=dict(  # DeformableDetrTransformerEncoder
        num_layers=6,
        layer_cfg=dict(  # DeformableDetrTransformerEncoderLayer
            self_attn_cfg=dict(  # MultiScaleDeformableAttention
                embed_dims=256,
                batch_first=True),
            ffn_cfg=dict(
                embed_dims=256, feedforward_channels=1024, ffn_drop=0.1))),
    decoder=dict(  # DeformableDetrTransformerDecoder
        num_layers=6,
        return_intermediate=True,
        layer_cfg=dict(  # DeformableDetrTransformerDecoderLayer
            self_attn_cfg=dict(  # MultiheadAttention
                embed_dims=256,
                num_heads=8,
                dropout=0.1,
                batch_first=True),
            cross_attn_cfg=dict(  # MultiScaleDeformableAttention
                embed_dims=256,
                batch_first=True),
            ffn_cfg=dict(
                embed_dims=256, feedforward_channels=1024, ffn_drop=0.1)),
        post_norm_cfg=None),
    positional_encoding=dict(num_feats=128, normalize=True, offset=-0.5),
    bbox_head=dict(
        type='DeformableDETRHead',
        num_classes=14, # edited to match WWF dataset     old -> 80
        sync_cls_avg_factor=True,
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=2.0),
        loss_bbox=dict(type='L1Loss', loss_weight=5.0),
        loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
    # training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='HungarianAssigner',
            match_costs=[
                dict(type='FocalLossCost', weight=2.0),
                dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
                dict(type='IoUCost', iou_mode='giou', weight=2.0)
            ])),
    test_cfg=dict(max_per_img=100))

# train_pipeline, NOTE the img_scale and the Pad's size_divisor is different
# from the default setting in mmdet.
train_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RandomFlip', prob=0.5),
    dict(
        type='RandomChoice',
        transforms=[
            [
                dict(
                    type='RandomChoiceResize',
                    scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                            (736, 1333), (768, 1333), (800, 1333)],
                    keep_ratio=True)
            ],
            [
                dict(
                    type='RandomChoiceResize',
                    # The radio of all image in train dataset < 7
                    # follow the original implement
                    scales=[(400, 4200), (500, 4200), (600, 4200)],
                    keep_ratio=True),
                dict(
                    type='RandomCrop',
                    crop_type='absolute_range',
                    crop_size=(384, 600),
                    allow_negative_crop=True),
                dict(
                    type='RandomChoiceResize',
                    scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                            (736, 1333), (768, 1333), (800, 1333)],
                    keep_ratio=True)
            ]
        ]),
    dict(type='PackDetInputs')
]

#load_from = "/home/wwfteam3/openmmlab/mmdetection/work_dirs/deformable-detr-refine-twostage_r50_16xb2-50e_coco/r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage-checkpoint.pth" # transfer learning from pretrained model

train_dataloader = dict(
    dataset=dict(
        filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline))

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(
        custom_keys={
            'backbone': dict(lr_mult=0.1),
            'sampling_offsets': dict(lr_mult=0.1),
            'reference_points': dict(lr_mult=0.1)
        }))

# learning policy
max_epochs = 10
train_cfg = dict(
    type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

param_scheduler = [
    dict(
        type='MultiStepLR',
        begin=0,
        end=max_epochs,
        by_epoch=True,
        milestones=[40],
        gamma=0.1)
]

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (16 GPUs) x (2 samples per GPU)
auto_scale_lr = dict(base_batch_size=2) # edited to match machine hardware       old -> 32

Did you make any modifications on the code or config? Did you understand what you have modified? I changed the directory of the coco formatted dataset as well as the configuration of training/validation/testing and the number of classes in the bbox of the model
What dataset did you use? a custom dataset formatted in COCO format (I tested pycocotools.COCO and it reads the data as well as it matches with the documentation of a COCO dataset on the website) Environment Ubunutu machine with Cuda 10.1 and correct python modules installed
Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here. (openmmlab) wwfteam3@AI-Lab-15:~/openmmlab/mmdetection/tools$ python collect_env.py sys.platform: linux Python: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Quadro RTX 4000 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.24 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.0a0 OpenCV: 4.7.0 MMEngine: 0.7.2 MMDetection: 3.0.0+ecac3a7

You may add addition that may be helpful for locating the problem, such as
1. How you installed PyTorch [e.g., pip, conda, source] conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
2. Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Results

If applicable, paste the related results here, e.g., what you expect and what you get.


2023/04/22 20:30:13 - mmengine - INFO - Epoch(train) [50][10100/10302]  lr: 2.0000e-05  eta: 0:01:03  time: 0.3051  data_time: 0.0034  memory: 3384  grad_norm: 89.2519  loss: 21.6675  loss_cls: 1.8000  loss_bbox: 0.6145  loss_iou: 0.7015  d0.loss_cls: 1.8219  d0.loss_bbox: 0.6559  d0.loss_iou: 0.7406  d1.loss_cls: 1.8319  d1.loss_bbox: 0.6173  d1.loss_iou: 0.7402  d2.loss_cls: 1.8154  d2.loss_bbox: 0.6085  d2.loss_iou: 0.6973  d3.loss_cls: 1.8089  d3.loss_bbox: 0.5832  d3.loss_iou: 0.7047  d4.loss_cls: 1.8078  d4.loss_bbox: 0.6006  d4.loss_iou: 0.7142  enc_loss_cls: 1.9641  enc_loss_bbox: 0.3495  enc_loss_iou: 0.4896
2023/04/22 20:30:29 - mmengine - INFO - Epoch(train) [50][10150/10302]  lr: 2.0000e-05  eta: 0:00:47  time: 0.3265  data_time: 0.0035  memory: 3827  grad_norm: 105.7978  loss: 22.3560  loss_cls: 1.8665  loss_bbox: 0.6654  loss_iou: 0.7125  d0.loss_cls: 1.8988  d0.loss_bbox: 0.6226  d0.loss_iou: 0.6976  d1.loss_cls: 1.9094  d1.loss_bbox: 0.6233  d1.loss_iou: 0.7689  d2.loss_cls: 1.8869  d2.loss_bbox: 0.6238  d2.loss_iou: 0.7114  d3.loss_cls: 1.8731  d3.loss_bbox: 0.6281  d3.loss_iou: 0.7310  d4.loss_cls: 1.8831  d4.loss_bbox: 0.6532  d4.loss_iou: 0.7087  enc_loss_cls: 2.0747  enc_loss_bbox: 0.3500  enc_loss_iou: 0.4669
2023/04/22 20:30:46 - mmengine - INFO - Epoch(train) [50][10200/10302]  lr: 2.0000e-05  eta: 0:00:31  time: 0.3281  data_time: 0.0034  memory: 3827  grad_norm: 106.4789  loss: 20.2432  loss_cls: 1.6865  loss_bbox: 0.6162  loss_iou: 0.6370  d0.loss_cls: 1.7019  d0.loss_bbox: 0.5865  d0.loss_iou: 0.6678  d1.loss_cls: 1.6985  d1.loss_bbox: 0.6257  d1.loss_iou: 0.6712  d2.loss_cls: 1.6888  d2.loss_bbox: 0.5633  d2.loss_iou: 0.6274  d3.loss_cls: 1.6825  d3.loss_bbox: 0.5907  d3.loss_iou: 0.6349  d4.loss_cls: 1.6898  d4.loss_bbox: 0.6126  d4.loss_iou: 0.6301  enc_loss_cls: 1.8878  enc_loss_bbox: 0.3029  enc_loss_iou: 0.4410
2023/04/22 20:30:46 - mmengine - INFO - Exp name: deformable-detr_r50_16xb2-50e_wwf_20230420_180243
2023/04/22 20:31:01 - mmengine - INFO - Epoch(train) [50][10250/10302]  lr: 2.0000e-05  eta: 0:00:16  time: 0.3091  data_time: 0.0034  memory: 3418  grad_norm: 108.6929  loss: 22.4026  loss_cls: 1.9388  loss_bbox: 0.6035  loss_iou: 0.7125  d0.loss_cls: 1.9742  d0.loss_bbox: 0.6285  d0.loss_iou: 0.6927  d1.loss_cls: 1.9696  d1.loss_bbox: 0.5888  d1.loss_iou: 0.6799  d2.loss_cls: 1.9468  d2.loss_bbox: 0.5897  d2.loss_iou: 0.6890  d3.loss_cls: 1.9294  d3.loss_bbox: 0.5840  d3.loss_iou: 0.7257  d4.loss_cls: 1.9516  d4.loss_bbox: 0.6044  d4.loss_iou: 0.7106  enc_loss_cls: 2.1390  enc_loss_bbox: 0.3216  enc_loss_iou: 0.4221
2023/04/22 20:31:17 - mmengine - INFO - Epoch(train) [50][10300/10302]  lr: 2.0000e-05  eta: 0:00:00  time: 0.3244  data_time: 0.0034  memory: 3384  grad_norm: 103.4770  loss: 21.1133  loss_cls: 1.7463  loss_bbox: 0.5742  loss_iou: 0.7009  d0.loss_cls: 1.7491  d0.loss_bbox: 0.6057  d0.loss_iou: 0.7272  d1.loss_cls: 1.7564  d1.loss_bbox: 0.6029  d1.loss_iou: 0.7628  d2.loss_cls: 1.7396  d2.loss_bbox: 0.5731  d2.loss_iou: 0.7108  d3.loss_cls: 1.7597  d3.loss_bbox: 0.5533  d3.loss_iou: 0.7120  d4.loss_cls: 1.7399  d4.loss_bbox: 0.5690  d4.loss_iou: 0.7080  enc_loss_cls: 1.9559  enc_loss_bbox: 0.3487  enc_loss_iou: 0.5176
2023/04/22 20:31:18 - mmengine - INFO - Exp name: deformable-detr_r50_16xb2-50e_wwf_20230420_180243
2023/04/22 20:31:18 - mmengine - INFO - Saving checkpoint at 50 epochs
2023/04/22 20:31:27 - mmengine - INFO - Epoch(val) [50][  50/3377]    eta: 0:06:29  time: 0.1171  data_time: 0.0016  memory: 1826  
2023/04/22 20:31:33 - mmengine - INFO - Epoch(val) [50][ 100/3377]    eta: 0:06:21  time: 0.1160  data_time: 0.0012  memory: 860  
2023/04/22 20:31:38 - mmengine - INFO - Epoch(val) [50][ 150/3377]    eta: 0:06:16  time: 0.1172  data_time: 0.0012  memory: 860  
2023/04/22 20:31:44 - mmengine - INFO - Epoch(val) [50][ 200/3377]    eta: 0:06:12  time: 0.1184  data_time: 0.0012  memory: 860  
2023/04/22 20:31:50 - mmengine - INFO - Epoch(val) [50][ 250/3377]    eta: 0:06:06  time: 0.1181  data_time: 0.0012  memory: 860  
2023/04/22 20:31:56 - mmengine - INFO - Epoch(val) [50][ 300/3377]    eta: 0:06:01  time: 0.1179  data_time: 0.0012  memory: 860  
2023/04/22 20:32:02 - mmengine - INFO - Epoch(val) [50][ 350/3377]    eta: 0:05:55  time: 0.1181  data_time: 0.0012  memory: 860  
2023/04/22 20:32:08 - mmengine - INFO - Epoch(val) [50][ 400/3377]    eta: 0:05:50  time: 0.1182  data_time: 0.0011  memory: 860  
2023/04/22 20:32:14 - mmengine - INFO - Epoch(val) [50][ 450/3377]    eta: 0:05:44  time: 0.1198  data_time: 0.0011  memory: 860  
2023/04/22 20:32:20 - mmengine - INFO - Epoch(val) [50][ 500/3377]    eta: 0:05:39  time: 0.1189  data_time: 0.0011  memory: 860  
2023/04/22 20:32:26 - mmengine - INFO - Epoch(val) [50][ 550/3377]    eta: 0:05:33  time: 0.1196  data_time: 0.0011  memory: 860  
2023/04/22 20:32:32 - mmengine - INFO - Epoch(val) [50][ 600/3377]    eta: 0:05:28  time: 0.1189  data_time: 0.0011  memory: 860  
2023/04/22 20:32:38 - mmengine - INFO - Epoch(val) [50][ 650/3377]    eta: 0:05:22  time: 0.1206  data_time: 0.0011  memory: 860  
2023/04/22 20:32:44 - mmengine - INFO - Epoch(val) [50][ 700/3377]    eta: 0:05:16  time: 0.1187  data_time: 0.0011  memory: 860  
2023/04/22 20:32:50 - mmengine - INFO - Epoch(val) [50][ 750/3377]    eta: 0:05:11  time: 0.1200  data_time: 0.0011  memory: 860  
2023/04/22 20:32:56 - mmengine - INFO - Epoch(val) [50][ 800/3377]    eta: 0:05:05  time: 0.1193  data_time: 0.0011  memory: 860  
2023/04/22 20:33:02 - mmengine - INFO - Epoch(val) [50][ 850/3377]    eta: 0:04:59  time: 0.1190  data_time: 0.0011  memory: 860  
2023/04/22 20:33:08 - mmengine - INFO - Epoch(val) [50][ 900/3377]    eta: 0:04:53  time: 0.1197  data_time: 0.0011  memory: 860  
2023/04/22 20:33:14 - mmengine - INFO - Epoch(val) [50][ 950/3377]    eta: 0:04:48  time: 0.1195  data_time: 0.0011  memory: 860  
2023/04/22 20:33:20 - mmengine - INFO - Epoch(val) [50][1000/3377]    eta: 0:04:42  time: 0.1198  data_time: 0.0011  memory: 860  
2023/04/22 20:33:26 - mmengine - INFO - Epoch(val) [50][1050/3377]    eta: 0:04:36  time: 0.1199  data_time: 0.0011  memory: 860  
2023/04/22 20:33:32 - mmengine - INFO - Epoch(val) [50][1100/3377]    eta: 0:04:30  time: 0.1190  data_time: 0.0011  memory: 860  
2023/04/22 20:33:38 - mmengine - INFO - Epoch(val) [50][1150/3377]    eta: 0:04:24  time: 0.1212  data_time: 0.0011  memory: 860  
2023/04/22 20:33:44 - mmengine - INFO - Epoch(val) [50][1200/3377]    eta: 0:04:18  time: 0.1190  data_time: 0.0011  memory: 860  
2023/04/22 20:33:50 - mmengine - INFO - Epoch(val) [50][1250/3377]    eta: 0:04:12  time: 0.1192  data_time: 0.0011  memory: 860  
2023/04/22 20:33:56 - mmengine - INFO - Epoch(val) [50][1300/3377]    eta: 0:04:07  time: 0.1205  data_time: 0.0011  memory: 860  
2023/04/22 20:34:02 - mmengine - INFO - Epoch(val) [50][1350/3377]    eta: 0:04:01  time: 0.1211  data_time: 0.0011  memory: 860  
2023/04/22 20:34:08 - mmengine - INFO - Epoch(val) [50][1400/3377]    eta: 0:03:55  time: 0.1197  data_time: 0.0011  memory: 860  
2023/04/22 20:34:14 - mmengine - INFO - Epoch(val) [50][1450/3377]    eta: 0:03:49  time: 0.1194  data_time: 0.0012  memory: 860  
2023/04/22 20:34:20 - mmengine - INFO - Epoch(val) [50][1500/3377]    eta: 0:03:43  time: 0.1205  data_time: 0.0011  memory: 860  
2023/04/22 20:34:26 - mmengine - INFO - Epoch(val) [50][1550/3377]    eta: 0:03:37  time: 0.1197  data_time: 0.0011  memory: 860  
2023/04/22 20:34:32 - mmengine - INFO - Epoch(val) [50][1600/3377]    eta: 0:03:31  time: 0.1195  data_time: 0.0011  memory: 860  
2023/04/22 20:34:38 - mmengine - INFO - Epoch(val) [50][1650/3377]    eta: 0:03:25  time: 0.1201  data_time: 0.0011  memory: 860  
2023/04/22 20:34:44 - mmengine - INFO - Epoch(val) [50][1700/3377]    eta: 0:03:19  time: 0.1202  data_time: 0.0011  memory: 860  
2023/04/22 20:34:50 - mmengine - INFO - Epoch(val) [50][1750/3377]    eta: 0:03:13  time: 0.1183  data_time: 0.0011  memory: 860  
2023/04/22 20:34:56 - mmengine - INFO - Epoch(val) [50][1800/3377]    eta: 0:03:07  time: 0.1195  data_time: 0.0011  memory: 860  
2023/04/22 20:35:02 - mmengine - INFO - Epoch(val) [50][1850/3377]    eta: 0:03:02  time: 0.1196  data_time: 0.0011  memory: 860  
2023/04/22 20:35:08 - mmengine - INFO - Epoch(val) [50][1900/3377]    eta: 0:02:56  time: 0.1195  data_time: 0.0011  memory: 860  
2023/04/22 20:35:14 - mmengine - INFO - Epoch(val) [50][1950/3377]    eta: 0:02:50  time: 0.1188  data_time: 0.0011  memory: 860  
2023/04/22 20:35:20 - mmengine - INFO - Epoch(val) [50][2000/3377]    eta: 0:02:44  time: 0.1191  data_time: 0.0011  memory: 860  
2023/04/22 20:35:26 - mmengine - INFO - Epoch(val) [50][2050/3377]    eta: 0:02:38  time: 0.1192  data_time: 0.0011  memory: 860  
2023/04/22 20:35:32 - mmengine - INFO - Epoch(val) [50][2100/3377]    eta: 0:02:32  time: 0.1189  data_time: 0.0011  memory: 860  
2023/04/22 20:35:38 - mmengine - INFO - Epoch(val) [50][2150/3377]    eta: 0:02:26  time: 0.1194  data_time: 0.0011  memory: 860  
2023/04/22 20:35:44 - mmengine - INFO - Epoch(val) [50][2200/3377]    eta: 0:02:20  time: 0.1199  data_time: 0.0016  memory: 860  
2023/04/22 20:35:50 - mmengine - INFO - Epoch(val) [50][2250/3377]    eta: 0:02:14  time: 0.1207  data_time: 0.0011  memory: 860  
2023/04/22 20:35:56 - mmengine - INFO - Epoch(val) [50][2300/3377]    eta: 0:02:08  time: 0.1205  data_time: 0.0012  memory: 860  
2023/04/22 20:36:02 - mmengine - INFO - Epoch(val) [50][2350/3377]    eta: 0:02:02  time: 0.1197  data_time: 0.0011  memory: 860  
2023/04/22 20:36:08 - mmengine - INFO - Epoch(val) [50][2400/3377]    eta: 0:01:56  time: 0.1193  data_time: 0.0011  memory: 860  
2023/04/22 20:36:14 - mmengine - INFO - Epoch(val) [50][2450/3377]    eta: 0:01:50  time: 0.1205  data_time: 0.0011  memory: 860  
2023/04/22 20:36:20 - mmengine - INFO - Epoch(val) [50][2500/3377]    eta: 0:01:44  time: 0.1186  data_time: 0.0011  memory: 860  
2023/04/22 20:36:26 - mmengine - INFO - Epoch(val) [50][2550/3377]    eta: 0:01:38  time: 0.1195  data_time: 0.0012  memory: 860  
2023/04/22 20:36:32 - mmengine - INFO - Epoch(val) [50][2600/3377]    eta: 0:01:32  time: 0.1208  data_time: 0.0011  memory: 860  
2023/04/22 20:36:38 - mmengine - INFO - Epoch(val) [50][2650/3377]    eta: 0:01:26  time: 0.1186  data_time: 0.0012  memory: 860  
2023/04/22 20:36:44 - mmengine - INFO - Epoch(val) [50][2700/3377]    eta: 0:01:20  time: 0.1195  data_time: 0.0011  memory: 860  
2023/04/22 20:36:50 - mmengine - INFO - Epoch(val) [50][2750/3377]    eta: 0:01:14  time: 0.1199  data_time: 0.0011  memory: 860  
2023/04/22 20:36:56 - mmengine - INFO - Epoch(val) [50][2800/3377]    eta: 0:01:08  time: 0.1190  data_time: 0.0011  memory: 860  
2023/04/22 20:37:02 - mmengine - INFO - Epoch(val) [50][2850/3377]    eta: 0:01:02  time: 0.1209  data_time: 0.0011  memory: 860  
2023/04/22 20:37:08 - mmengine - INFO - Epoch(val) [50][2900/3377]    eta: 0:00:56  time: 0.1202  data_time: 0.0011  memory: 860  
2023/04/22 20:37:14 - mmengine - INFO - Epoch(val) [50][2950/3377]    eta: 0:00:50  time: 0.1201  data_time: 0.0011  memory: 860  
2023/04/22 20:37:20 - mmengine - INFO - Epoch(val) [50][3000/3377]    eta: 0:00:45  time: 0.1188  data_time: 0.0011  memory: 860  
2023/04/22 20:37:26 - mmengine - INFO - Epoch(val) [50][3050/3377]    eta: 0:00:39  time: 0.1187  data_time: 0.0011  memory: 860  
2023/04/22 20:37:32 - mmengine - INFO - Epoch(val) [50][3100/3377]    eta: 0:00:33  time: 0.1192  data_time: 0.0011  memory: 860  
2023/04/22 20:37:38 - mmengine - INFO - Epoch(val) [50][3150/3377]    eta: 0:00:27  time: 0.1193  data_time: 0.0011  memory: 860  
2023/04/22 20:37:44 - mmengine - INFO - Epoch(val) [50][3200/3377]    eta: 0:00:21  time: 0.1202  data_time: 0.0011  memory: 860  
2023/04/22 20:37:50 - mmengine - INFO - Epoch(val) [50][3250/3377]    eta: 0:00:15  time: 0.1189  data_time: 0.0011  memory: 860  
2023/04/22 20:37:56 - mmengine - INFO - Epoch(val) [50][3300/3377]    eta: 0:00:09  time: 0.1186  data_time: 0.0011  memory: 860  
2023/04/22 20:38:02 - mmengine - INFO - Epoch(val) [50][3350/3377]    eta: 0:00:03  time: 0.1193  data_time: 0.0011  memory: 860  
2023/04/22 20:38:08 - mmengine - INFO - Evaluating bbox...
2023/04/22 20:38:18 - mmengine - INFO - bbox_mAP_copypaste: 0.000 0.000 0.000 0.000 0.000 0.000
2023/04/22 20:38:18 - mmengine - INFO - Epoch(val) [50][3377/3377]    coco/bbox_mAP: 0.0000  coco/bbox_mAP_50: 0.0000  coco/bbox_mAP_75: 0.0000  coco/bbox_mAP_s: 0.0000  coco/bbox_mAP_m: 0.0000  coco/bbox_mAP_l: 0.0000  data_time: 0.0011  time: 0.1200```

sipie800 commented 1 year ago

same thing is happening here

FPPMXB commented 11 months ago

您解决这个问题了吗？我在detr上也遇到了这个问题，好像基于detr系列的模型都是这样的

Alaric423 commented 11 months ago

您解决这个问题了吗？我在detr上也遇到了这个问题，好像基于detr系列的模型都是这样的

我用了Pascal VOC的数据集在跑deformable detr模型，能显示正常的结果。我用了预训练的权重，但是结果不是很好，mAP有点低。

FPPMXB commented 11 months ago

还没有，太痛苦了，可以加一下qq讨论一下。2364174831

---Original--- From: @.> Date: Sun, Dec 10, 2023 20:35 PM To: @.>; Cc: @.**@.>; Subject: Re: [open-mmlab/mmdetection] mAP remained 0 across 50 epochs ondeformable detr (Issue #10208)

您解决这个问题了吗？我在detr上也遇到了这个问题，好像基于detr系列的模型都是这样的

我用了Pascal VOC的数据集在跑deformable detr模型，能显示正常的结果。我用了预训练的权重，但是结果不是很好，mAP有点低。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Mohamed-Alshafai commented 11 months ago

您解决这个问题了吗？我在detr上也遇到了这个问题，好像基于detr系列的模型都是这样的

I haven't had any luck with it sadly.

iclaramuntCELSOSPV commented 7 months ago

I faced the same issue when training a DINO model (DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection). My mAP also remained 0 or close to it throughout all epochs. I did notice though that after the scheduled lr decay the model did seem to improve the mAP.

So my guess was that the model had a lr too high that it diverged rather than converging, and once the lr decayed it was just too small for the model to learn anything at all. I noticed you're using a lr of 0.0002, so try reducing this value. In my case using the default value fixed the issue (changed from 0.00025 to 0.0001).

Hope this solves the issue for you as well.

open-mmlab / mmdetection

mAP remained 0 across 50 epochs on deformable detr #10208