megvii-research / LGD

Official Implementation of the detection self-distillation framework LGD.
Other
53 stars 2 forks source link

Performance down in some other detectors #2

Open Kizna1ver opened 1 year ago

Kizna1ver commented 1 year ago

Hi, thanks for greate work. Have you ever tried LGD in advanced detector, such as TOOD or DDOD? I reimplement LGD in MMDetection and insert LGD to TOOD and DDOD. But it results lower performance than baseline(DDOD mAP down from 41.7 to 38.7 and TOOD mAP down from 42.3 to 38.7) in R50-FPN 1xss setting. BTW, the code in your repo also contain ATSS Detector, have you tried ATSS with LGD? I didn't see this experiment in your paper. It would be appreciate if you can provide more experiment info :)

Kizna1ver commented 1 year ago

And I try the ATSSCT distillator in your repo with the cfg files below. The result shows that LGD improve the mAP to 39.89 compared with the baseline 39.42 in 1xss R50 setting. The baseline mAP seems normal. But the perfomance up from LGD seems not so good. Is the result as expected? I want try some improve work on LGD, and this result is very important for me to do some follow-up work. It would be appreciated if you can provide more experiment info. Thanks! @zhangpzh Here are cfg files: atss_R_50_1xSS_prD30K_prS10K_bs16.yaml

_BASE_: "../../oss_baseline/Base-RetinaNet_1xss_bs16.yaml"
MODEL:
  META_ARCHITECTURE: 'DistillatorATSS'
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  MASK_ON: False
  RESNETS:
    DEPTH: 50
  DISTILLATOR:
    TEACHER:
        META_ARCH: 'DynamicTeacher'
        SOLVER:
            OPTIMIZER: 'SGD'
            BASE_LR: 0.01
            MOMENTUM: 0.9
            WEIGHT_DECAY: 1e-4
            LR_SCHEDULER_NAME: "WarmupMultiStepLR"
            STEPS: (60000, 80000)
            GAMMA: 0.1
            WARMUP_FACTOR: 1e-3
            WARMUP_ITERS:  1e03
            WARMUP_METHOD: "linear"
        INTERACT_PATTERN: 'stuGuided'
        DETACH_APPEARANCE_EMBED: False
        ADD_CONTEXT_BOX: True
    STUDENT:
        META_ARCH: 'ATSSCT'
        SOLVER:
            OPTIMIZER: 'SGD'
            BASE_LR: 0.01
            MOMENTUM: 0.9
            WEIGHT_DECAY: 1e-4
            LR_SCHEDULER_NAME: "WarmupMultiStepLR"
            STEPS: (60000, 80000)
            GAMMA: 0.1
            WARMUP_FACTOR: 1e-3
            WARMUP_ITERS:  1e03
            WARMUP_METHOD: "linear"
    ADAPTER:
        META_ARCH: 'SequentialConvs'
    PRE_NONDISTILL_ITERS: 30000
    POST_NONDISTILL_ITERS: 0
    PRE_FREEZE_STUDENT_BACKBONE_ITERS: 10000
    LAMBDA: 1.0
    EVAL_TEACHER: True
INPUT:
  MIN_SIZE_TRAIN: (800,)
SOLVER:
  STEPS: (60000, 80000)
  MAX_ITER: 90000
# OUTPUT_DIR: 'outputs/RetinaNet/retinanet_R_50_1xSS_stuGuided_addCtxBox=YES_detachAppearanceEmbed=NO_preNondistillIters=30k_preFreezeStudentBackboneIters=10k/'

Base-RetinaNet_1xss_bs16.yaml

_BASE_: "./bs32_schedule1x.yaml"
MODEL:
  META_ARCHITECTURE: "RetinaNet"
  # TODO weight and deepth
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  BACKBONE:
    NAME: "build_retinanet_resnet_fpn_backbone"
  RESNETS:
    # NORM: "SyncBN"
    OUT_FEATURES: ["res3", "res4", "res5"]
  ANCHOR_GENERATOR:
    SIZES: !!python/object/apply:eval ["[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]"]
  FPN:
    # NORM: "SyncBN"
    IN_FEATURES: ["res3", "res4", "res5"]
  RETINANET:
    IOU_THRESHOLDS: [0.4, 0.5]
    IOU_LABELS: [0, -1, 1]
    SMOOTH_L1_LOSS_BETA: 0.0
DATASETS:
  TRAIN: ("coco_2017_train_oss",)
  TEST: ("coco_2017_val_oss",)
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
  STEPS: (60000, 80000)
  MAX_ITER: 90000
  CLIP_GRADIENTS: {"ENABLED": True}
  CHECKPOINT_PERIOD: 10000
  # warmup
  WARMUP_FACTOR: 1e-3
  WARMUP_ITERS:  1000
  WARMUP_METHOD: "linear"
INPUT:
  MIN_SIZE_TRAIN: (800,)
VERSION: 2
TEST:
  EVAL_PERIOD: 10000
OSS_PREFIX: '/data/oss_bucket_0/'
# OUTPUT_DIR: '' # specified by jobname in mdl args