tianzhi0549 / FCOS

FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
https://arxiv.org/abs/1904.01355
Other
3.27k stars 630 forks source link

When I train resnet101 backbone,I encounter this error,while I can train resnet50 successfuly with the same coco file,what should I do? #28

Closed gittigxuy closed 5 years ago

gittigxuy commented 5 years ago

2019-05-04 00:17:28,682 maskrcnn_benchmark.trainer INFO: Start training Traceback (most recent call last): File "tools/train_net.py", line 189, in main() File "tools/train_net.py", line 182, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 87, in train arguments, File "/home/abc/code/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in dotrain for iteration, (images, targets, ) in enumerate(data_loader, start_iter): File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next return self._process_next_batch(batch) File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) TypeError: Traceback (most recent call last): File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/datasets/coco.py", line 94, in getitem img, target = self.transforms(img, target) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call image, target = t(image, target) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call size = self.get_size(image.size) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 42, in get_size if max_original_size / min_original_size size > max_size: TypeError: unsupported operand type(s) for : 'float' and 'range'

Traceback (most recent call last): File "tools/train_net.py", line 189, in main() File "tools/train_net.py", line 182, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 87, in train arguments, File "/home/abc/code/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in dotrain for iteration, (images, targets, ) in enumerate(data_loader, start_iter): File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next return self._process_next_batch(batch) File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) TypeError: Traceback (most recent call last): File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/abc/anaconda3/envs/FCOS/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/datasets/coco.py", line 94, in getitem img, target = self.transforms(img, target) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call image, target = t(image, target) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call size = self.get_size(image.size) File "/home/abc/code/FCOS/maskrcnn_benchmark/data/transforms/transforms.py", line 42, in get_size if max_original_size / min_original_size size > max_size: TypeError: unsupported operand type(s) for : 'float' and 'range'

gittigxuy commented 5 years ago

Here is my log file: OS: Ubuntu 16.04.5 LTS GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609 CMake version: Could not collect

Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 410.48 cuDNN version: Probably one of the following: /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so.7.4.1 /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn_static.a

Versions of relevant libraries: [pip] Could not collect [conda] torch 1.0.0 [conda] torchvision 0.2.2.post3 Pillow (6.0.0) 2019-05-04 00:17:09,221 maskrcnn_benchmark INFO: Loaded configuration file /home/abc/code/FCOS/configs/fcos/fcos_R_101_FPN_2x.yaml 2019-05-04 00:17:09,221 maskrcnn_benchmark INFO: MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-101" RPN_ONLY: True FCOS_ON: True BACKBONE: CONV_BODY: "R-101-FPN-RETINANET" RESNETS: BACKBONE_OUT_CHANNELS: 256 RETINANET: USE_C5: False # FCOS uses P5 instead of C5 DATASETS:

TRAIN: ("coco_2014_train", "coco_2014_valminusminival")

TEST: ("coco_2014_minival",)

TRAIN: ("coco_citypersons_train", ) TEST: ("coco_citypersons_val",) INPUT: MIN_SIZE_RANGE_TRAIN: (640, 800) MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MAX_SIZE_TEST: 1333 DATALOADER: SIZE_DIVISIBILITY: 32 SOLVER: BASE_LR: 0.01 WEIGHT_DECAY: 0.0001 STEPS: (120000, 160000) MAX_ITER: 180000 IMS_PER_BATCH: 16 WARMUP_METHOD: "constant"

2019-05-04 00:17:09,223 maskrcnn_benchmark INFO: Running with config: DATALOADER: ASPECT_RATIO_GROUPING: True NUM_WORKERS: 4 SIZE_DIVISIBILITY: 32 DATASETS: TEST: ('coco_citypersons_val',) TRAIN: ('coco_citypersons_train',) INPUT: MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_RANGE_TRAIN: (640, 800) MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN: (800,) PIXEL_MEAN: [102.9801, 115.9465, 122.7717] PIXEL_STD: [1.0, 1.0, 1.0] TO_BGR255: True MODEL: BACKBONE: CONV_BODY: R-101-FPN-RETINANET FREEZE_CONV_BODY_AT: 2 USE_GN: False CLS_AGNOSTIC_BBOX_REG: False DEVICE: cuda FBNET: ARCH: default ARCH_DEF: BN_TYPE: bn DET_HEAD_BLOCKS: [] DET_HEAD_LAST_SCALE: 1.0 DET_HEAD_STRIDE: 0 DW_CONV_SKIP_BN: True DW_CONV_SKIP_RELU: True KPTS_HEAD_BLOCKS: [] KPTS_HEAD_LAST_SCALE: 0.0 KPTS_HEAD_STRIDE: 0 MASK_HEAD_BLOCKS: [] MASK_HEAD_LAST_SCALE: 0.0 MASK_HEAD_STRIDE: 0 RPN_BN_TYPE: RPN_HEAD_BLOCKS: 0 SCALE_FACTOR: 1.0 WIDTH_DIVISOR: 1 FCOS: FPN_STRIDES: [8, 16, 32, 64, 128] INFERENCE_TH: 0.05 LOSS_ALPHA: 0.25 LOSS_GAMMA: 2.0 NMS_TH: 0.4 NUM_CLASSES: 2 NUM_CONVS: 4 PRE_NMS_TOP_N: 1000 PRIOR_PROB: 0.01 FCOS_ON: True FPN: USE_GN: False USE_RELU: False GROUP_NORM: DIM_PER_GP: -1 EPSILON: 1e-05 NUM_GROUPS: 32 KEYPOINT_ON: False MASK_ON: False META_ARCHITECTURE: GeneralizedRCNN RESNETS: BACKBONE_OUT_CHANNELS: 256 NUM_GROUPS: 1 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_FUNC: StemWithFixedBatchNorm STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True TRANS_FUNC: BottleneckWithFixedBatchNorm WIDTH_PER_GROUP: 64 RETINANET: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDES: (8, 16, 32, 64, 128) ASPECT_RATIOS: (0.5, 1.0, 2.0) BBOX_REG_BETA: 0.11 BBOX_REG_WEIGHT: 4.0 BG_IOU_THRESHOLD: 0.4 FG_IOU_THRESHOLD: 0.5 INFERENCE_TH: 0.05 LOSS_ALPHA: 0.25 LOSS_GAMMA: 2.0 NMS_TH: 0.4 NUM_CLASSES: 2 NUM_CONVS: 4 OCTAVE: 2.0 PRE_NMS_TOP_N: 1000 PRIOR_PROB: 0.01 SCALES_PER_OCTAVE: 3 STRADDLE_THRESH: 0 USE_C5: False RETINANET_ON: False ROI_BOX_HEAD: CONV_HEAD_DIM: 256 DILATION: 1 FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 2 NUM_STACKED_CONVS: 4 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_SCALES: (0.0625,) PREDICTOR: FastRCNNPredictor USE_GN: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) BG_IOU_THRESHOLD: 0.5 DETECTIONS_PER_IMG: 100 FG_IOU_THRESHOLD: 0.5 NMS: 0.5 POSITIVE_FRACTION: 0.25 SCORE_THRESH: 0.05 USE_FPN: False ROI_KEYPOINT_HEAD: CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512) FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_SCALES: (0.0625,) PREDICTOR: KeypointRCNNPredictor RESOLUTION: 14 SHARE_BOX_FEATURE_EXTRACTOR: True ROI_MASK_HEAD: CONV_LAYERS: (256, 256, 256, 256) DILATION: 1 FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor MLP_HEAD_DIM: 1024 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_SCALES: (0.0625,) POSTPROCESS_MASKS: False POSTPROCESS_MASKS_THRESHOLD: 0.5 PREDICTOR: MaskRCNNC4Predictor RESOLUTION: 14 SHARE_BOX_FEATURE_EXTRACTOR: True USE_GN: False RPN: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDE: (16,) ASPECT_RATIOS: (0.5, 1.0, 2.0) BATCH_SIZE_PER_IMAGE: 256 BG_IOU_THRESHOLD: 0.3 FG_IOU_THRESHOLD: 0.7 FPN_POST_NMS_TOP_N_TEST: 2000 FPN_POST_NMS_TOP_N_TRAIN: 1000 MIN_SIZE: 0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 6000 PRE_NMS_TOP_N_TRAIN: 12000 RPN_HEAD: SingleConvRPNHead STRADDLE_THRESH: 0 USE_FPN: False RPN_ONLY: True WEIGHT: catalog://ImageNetPretrained/MSRA/R-101 OUTPUT_DIR: training_dir/FCOS_0503 PATHS_CATALOG: /home/abc/code/FCOS/maskrcnn_benchmark/config/paths_catalog.py SOLVER: BASE_LR: 0.01 BIAS_LR_FACTOR: 2 CHECKPOINT_PERIOD: 2500 GAMMA: 0.1 IMS_PER_BATCH: 16 MAX_ITER: 180000 MOMENTUM: 0.9 STEPS: (120000, 160000) WARMUP_FACTOR: 0.3333333333333333 WARMUP_ITERS: 500 WARMUP_METHOD: constant WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0 TEST: DETECTIONS_PER_IMG: 100 EXPECTED_RESULTS: [] EXPECTED_RESULTS_SIGMA_TOL: 4 IMS_PER_BATCH: 8

tianzhi0549 commented 5 years ago

@gittigxuy It seems that you are using another dataset. Did R-101 work normally with COCO?

gittigxuy commented 5 years ago

no,it encounter same problem when I run coco2014 with resnet101,could you please tell me how to fix the code?

tianzhi0549 commented 5 years ago

@gittigxuy Are you using the latest version? I have tested it with R-101 and it works normally with coco.

gittigxuy commented 5 years ago

yes,I git from your April 12 th version,I could not deal with the problem.I also git the newest code but the same problem.what should I do?thanks

tianzhi0549 commented 5 years ago

@gittigxuy We are sorry. It was a bug and has been fixed. Please use the latest code.

gittigxuy commented 5 years ago

Thanks,fix the bug