Closed swiftshunfeng closed 4 years ago
It's hard to tell without more trace info. Where's the seg fault coming from?
It's hard to tell without more trace info. Where's the seg fault coming from? 2020-04-20 11:55:33,781 maskrcnn_benchmark.trainer INFO: eta: 4:47:48 iter: 20 loss: 0.9685 (1.0243) loss_classifier: 0.5886 (0.6797) loss_box_reg: 0.0112 (0.0115) loss_objectness: 0.3001 (0.2683) loss_rpn_box_reg: 0.0463 (0.0648) time: 0.1578 (0.1919) data: 0.0033 (0.0302) lr: 0.000900 max mem: 895 I fixed the problem by reinstall my conda envs, so I guess is the environment problem But when i train my data,it's only train class and regression, not keypoint, so How to train the keyponit loss.
It's hard to tell without more trace info. Where's the seg fault coming from? 2020-04-20 11:55:24,347 maskrcnn_benchmark INFO: Using 1 GPUs 2020-04-20 11:55:24,347 maskrcnn_benchmark INFO: Namespace(config_file='/data_2t/home/ligh/dangsf/res2net_mask/rotated_maskrcnn/configs/rotated/e2e_mask_rcnn_R_50_FPN_1x.yaml', distributed=False, local_rank=0, opts=[], skip_test=False) 2020-04-20 11:55:24,347 maskrcnn_benchmark INFO: Collecting env info (might take some time) 2020-04-20 11:55:25,349 maskrcnn_benchmark INFO: PyTorch version: 1.1.0 Is debug build: No CUDA used to build PyTorch: 10.0.130
OS: Ubuntu 16.04.6 LTS GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 CMake version: Could not collect
Python version: 3.6 Is CUDA available: Yes CUDA runtime version: Could not collect GPU models and configuration: GPU 0: TITAN X (Pascal) GPU 1: TITAN X (Pascal)
Nvidia driver version: 410.48 cuDNN version: Could not collect
Versions of relevant libraries: [pip] Could not collect [conda] Could not collect Pillow (4.2.1) 2020-04-20 11:55:25,349 maskrcnn_benchmark INFO: Loaded configuration file /data_2t/home/ligh/dangsf/res2net_mask/rotated_maskrcnn/configs/rotated/e2e_mask_rcnn_R_50_FPN_1x.yaml 2020-04-20 11:55:25,349 maskrcnn_benchmark INFO: MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50" ROTATED: True BACKBONE: CONV_BODY: "R-50-FPN" RESNETS: BACKBONE_OUT_CHANNELS: 256 RPN: USE_FPN: True ANCHOR_STRIDE: (4, 8, 16, 32, 64) PRE_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TEST: 1000
STRADDLE_THRESH: -1
ANCHOR_ANGLES: (-90, -60, -30)
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0, 1.0)
ROI_HEADS: USE_FPN: True
# weights on (dx, dy, dw, dh, dtheta) for normalizing rotated rect regression targets
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0, 1.0)
ROI_BOX_HEAD: POOLER_RESOLUTION: 7 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor" PREDICTOR: "FPNPredictor" NUM_CLASSES: 5 ROI_KEYPOINT_HEAD: POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) FEATURE_EXTRACTOR: "KeypointRCNNFeatureExtractor" PREDICTOR: "KeypointRCNNPredictor" POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 RESOLUTION: 56 SHARE_BOX_FEATURE_EXTRACTOR: False KEYPOINT_ON: True ROI_MASK_HEAD: POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor" PREDICTOR: "MaskRCNNC4Predictor" POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False MASK_ON: False DATASETS: TRAIN: ("coco_2014_train", ) TEST: ("coco_2014_val",) DATALOADER: SIZE_DIVISIBILITY: 32 SOLVER: BASE_LR: 0.0025 # 0.02 WEIGHT_DECAY: 0.0002 # 0.0001 STEPS: (60000, 80000) MAX_ITER: 90000
2020-04-20 11:55:25,350 maskrcnn_benchmark INFO: Running with config: AMP_VERBOSE: False DATALOADER: ASPECT_RATIO_GROUPING: True NUM_WORKERS: 4 SIZE_DIVISIBILITY: 32 DATASETS: TEST: ('coco_2014_val',) TRAIN: ('coco_2014_train',) DTYPE: float32 INPUT: BRIGHTNESS: 0.0 CONTRAST: 0.0 HORIZONTAL_FLIP_PROB_TRAIN: 0.5 HUE: 0.0 MAX_SIZE_TEST: 640 MAX_SIZE_TRAIN: 640 MIN_SIZE_TEST: 256 MIN_SIZE_TRAIN: (256,) PIXEL_MEAN: [102.9801, 115.9465, 122.7717] PIXEL_STD: [1.0, 1.0, 1.0] ROTATE_DEGREES_TRAIN: (-90.0, 90.0) ROTATE_PROB_TRAIN: 0.0 SATURATION: 0.0 TO_BGR255: True VERTICAL_FLIP_PROB_TRAIN: 0.0 MODEL: BACKBONE: CONV_BODY: R-50-FPN FREEZE_CONV_BODY_AT: 2 CLS_AGNOSTIC_BBOX_REG: False DEVICE: cuda FBNET: ARCH: default ARCH_DEF: BN_TYPE: bn DET_HEAD_BLOCKS: [] DET_HEAD_LAST_SCALE: 1.0 DET_HEAD_STRIDE: 0 DW_CONV_SKIP_BN: True DW_CONV_SKIP_RELU: True KPTS_HEAD_BLOCKS: [] KPTS_HEAD_LAST_SCALE: 0.0 KPTS_HEAD_STRIDE: 0 MASK_HEAD_BLOCKS: [] MASK_HEAD_LAST_SCALE: 0.0 MASK_HEAD_STRIDE: 0 RPN_BN_TYPE: RPN_HEAD_BLOCKS: 0 SCALE_FACTOR: 1.0 WIDTH_DIVISOR: 1 FPN: USE_GN: False USE_RELU: False GROUP_NORM: DIM_PER_GP: -1 EPSILON: 1e-05 NUM_GROUPS: 32 KEYPOINT_ON: True MASKIOU_ON: False MASK_ON: False META_ARCHITECTURE: GeneralizedRCNN RESNETS: BACKBONE_OUT_CHANNELS: 256 DEFORMABLE_GROUPS: 1 NUM_GROUPS: 1 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STAGE_WITH_DCN: (False, False, False, False) STEM_FUNC: StemWithFixedBatchNorm STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True TRANS_FUNC: BottleneckWithFixedBatchNorm WIDTH_PER_GROUP: 64 WITH_MODULATED_DCN: False RETINANET: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDES: (8, 16, 32, 64, 128) ASPECT_RATIOS: (0.5, 1.0, 2.0) BBOX_REG_BETA: 0.11 BBOX_REG_WEIGHT: 4.0 BG_IOU_THRESHOLD: 0.4 FG_IOU_THRESHOLD: 0.5 INFERENCE_TH: 0.05 LOSS_ALPHA: 0.25 LOSS_GAMMA: 2.0 NMS_TH: 0.4 NUM_CLASSES: 5 NUM_CONVS: 4 OCTAVE: 2.0 PRE_NMS_TOP_N: 1000 PRIOR_PROB: 0.01 SCALES_PER_OCTAVE: 3 STRADDLE_THRESH: 0 USE_C5: True RETINANET_ON: False ROI_BOX_HEAD: CONV_HEAD_DIM: 256 DILATION: 1 FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 5 NUM_STACKED_CONVS: 4 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) PREDICTOR: FPNPredictor USE_GN: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 BBOX_REG_ANGLE_RELATIVE: True BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0, 1.0) BG_IOU_THRESHOLD: 0.5 DETECTIONS_PER_IMG: 100 FG_IOU_THRESHOLD: 0.5 NMS: 0.5 POSITIVE_FRACTION: 0.25 SCORE_THRESH: 0.05 SOFT_NMS: METHOD: 2 SCORE_THRESH: 0.01 SIGMA: 0.5 USE_FPN: True USE_SOFT_NMS: False ROI_KEYPOINT_HEAD: CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512) FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 8 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) PREDICTOR: KeypointRCNNPredictor RESOLUTION: 56 SHARE_BOX_FEATURE_EXTRACTOR: False ROI_MASKIOU_HEAD: CONV_LAYERS: (256, 256, 256, 256) LOSS_WEIGHT: 1.0 MLP_HEAD_DIM: 1024 USE_GN: False USE_NMS: False ROI_MASK_HEAD: CONV_LAYERS: (256, 256, 256, 256) DILATION: 1 FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor MLP_HEAD_DIM: 1024 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POSTPROCESS_MASKS: False POSTPROCESS_MASKS_THRESHOLD: 0.5 PREDICTOR: MaskRCNNC4Predictor RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False USE_GN: False WITH_CLASSIFIER: False ROTATED: True RPN: ANCHOR_ANGLES: (-90, -60, -30) ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDE: (4, 8, 16, 32, 64) ASPECT_RATIOS: (0.5, 1.0, 2.0) BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_ANGLE_RELATIVE: True BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0, 1.0) BG_IOU_THRESHOLD: 0.3 FG_IOU_THRESHOLD: 0.7 FPN_POST_NMS_PER_BATCH: True FPN_POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TRAIN: 2000 MIN_SIZE: 0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 PRE_NMS_TOP_N_TRAIN: 2000 RPN_HEAD: SingleConvRPNHead STRADDLE_THRESH: -1 USE_FPN: True RPN_ONLY: False WEIGHT: catalog://ImageNetPretrained/MSRA/R-50 WEIGHT_LOAD_OPTIMIZER: True WEIGHT_LOAD_SCHEDULER: True OUTPUT_DIR: output10 PATHS_CATALOG: /data_2t/home/ligh/dangsf/res2net_mask/rotated_maskrcnn/maskrcnn_benchmark/config/paths_catalog.py SOLVER: BASE_LR: 0.0025 BIAS_LR_FACTOR: 2 CHECKPOINT_PERIOD: 5000 GAMMA: 0.1 IMS_PER_BATCH: 2 MAX_ITER: 90000 MOMENTUM: 0.9 OPTIMIZER: SGD STEPS: (60000, 80000) WARMUP_FACTOR: 0.3333333333333333 WARMUP_ITERS: 500 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0002 WEIGHT_DECAY_BIAS: 0 TEST: BBOX_AUG: ENABLED: False H_FLIP: False MAX_SIZE: 4000 SCALES: () SCALE_H_FLIP: False DETECTIONS_PER_IMG: 100 EXPECTED_RESULTS: [] EXPECTED_RESULTS_SIGMA_TOL: 4 IMS_PER_BATCH: 1 2020-04-20 11:55:25,350 maskrcnn_benchmark INFO: Saving config into: output10/config.yml Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are: enabled : True opt_level : O0 cast_model_type : torch.float32 patch_torch_functions : False keep_batchnorm_fp32 : None master_weights : False loss_scale : 1.0 Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O0 cast_model_type : torch.float32 patch_torch_functions : False keep_batchnorm_fp32 : None master_weights : False loss_scale : 1.0 2020-04-20 11:55:33,781 maskrcnn_benchmark.trainer INFO: eta: 4:47:48 iter: 20 loss: 0.9685 (1.0243) loss_classifier: 0.5886 (0.6797) loss_box_reg: 0.0112 (0.0115) loss_objectness: 0.3001 (0.2683) loss_rpn_box_reg: 0.0463 (0.0648) time: 0.1578 (0.1919) data: 0.0033 (0.0302) lr: 0.000900 max mem: 895 I fixed the problem by reinstall my conda envs, so I guess is the environment problem But when i train my data,it's only train class and regression, not keypoint, so How to train the keyponit loss.
Keypoint detection is not included in this repo
❓ Questions and Help
2020-04-20 01:33:14,945 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from catalog://ImageNetPretrained/MSRA/R-50 2020-04-20 01:33:14,945 maskrcnn_benchmark.utils.checkpoint INFO: catalog://ImageNetPretrained/MSRA/R-50 points to https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl 2020-04-20 01:33:14,945 maskrcnn_benchmark.utils.checkpoint INFO: url https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl cached in /home/ligh/.torch/models/R-50.pkl 2020-04-20 01:33:15,006 maskrcnn_benchmark.utils.c2_model_loading INFO: Remapping C2 weights 2020-04-20 01:33:15,006 maskrcnn_benchmark.utils.c2_model_loading INFO: Remapping conv weights for deformable conv weights loading annotations into memory... Done (t=1.84s) creating index... index created! 2020-04-20 01:33:17,581 maskrcnn_benchmark.trainer INFO: Start training bash: line 1: 15406 Segmentation fault (core dumped) env "PYTHONUNBUFFERED"="1" "PYTHONPATH"="/home/ligh/dangsf/res2net_mask/rotated_maskrcnn" "PYCHARM_HOSTED"="1" "JETBRAINS_REMOTE_RUN"="1" "PYTHONIOENCODING"="UTF-8" /data_2t/home/ligh/anaconda3/envs/maskrcnn_benchmark/bin/python3.6 -u /home/ligh/dangsf/res2net_mask/rotated_maskrcnn/tools/train_net.py
when i train my data ,i meet the problem,someone can help me, thanks in advance!