youngwanLEE / CenterMask

[CVPR 2020] CenterMask : Real-Time Anchor-Free Instance Segmentation
https://arxiv.org/abs/1911.06667
Other
770 stars 124 forks source link

nan loss in train (but not when I use mobilenet backbone ) #59

Closed omerbrandis closed 3 years ago

omerbrandis commented 3 years ago

Hi,

I'm trying to train from SCRATCH on my own dataset ( 24 samples, 4 classes ) when I use the mnv2 backbone, the training "works" when I use others (resnet 50 , vnet) , i get nan loss on input from first iteration. any ideas ?

I tried copying the config files from config/centermask ( for example centermask_V_19_eSE_FPN_lite_res600_ms_bs16_4x.yaml) and changed the following INPUT: MIN_SIZE_RANGE_TRAIN: (720, 1280) MAX_SIZE_TRAIN: 1280 MIN_SIZE_TEST: 720 MAX_SIZE_TEST: 1280 PIXEL_MEAN : [0, 0, 0] # disabling this functionallity DATALOADER: SIZE_DIVISIBILITY: 32 NUM_WORKERS : 1 SOLVER: BASE_LR: 0.001 WEIGHT_DECAY: 0.0001 STEPS: (0,60000, 80000) MAX_ITER: 90000 IMS_PER_BATCH: 2 WARMUP_METHOD: "constant" CHECKPOINT_PERIOD : 1000 TEST_PERIOD : 1000 TEST: IMS_PER_BATCH : 1

also added NUM_CLASSES: 4 to fcos, RETINANET, ROI_BOX_HEAD. removed WEIGHT parameter.

omerbrandis commented 3 years ago

solved the problem by changing the following : WARMUP_ITERS = 0 , in order to gain simpler control over the lr. lowered the lr (BASE_LR: 0.001) provided relevant values for input.PIXEL_MEAN , input.PIXEL_STD

:-)