open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.61k stars 9.47k forks source link

Error while running python tools/train.py configs/faster_rcnn_r50_fpn_1x.py #1798

Closed IISCAditayTripathi closed 4 years ago

IISCAditayTripathi commented 4 years ago

I am training Faster-RCNN for ms-coco 2017. I am running the command as shown in title. But I am getting the following error:


2019-12-12 13:53:50,181 - INFO - Distributed training: False                                                                                                                                                                                 
2019-12-12 13:53:50,181 - INFO - MMDetection Version: 1.0.rc0+b7894cb                                                                                                                                                                        
2019-12-12 13:53:50,181 - INFO - Config: # model settings                                                                                                                                                                                    
model = dict(                                                                                                                                                                                                                                    
type='FasterRCNN',                                                                                                                                                                                                                           
pretrained='torchvision://resnet50',                                                                                                                                                                                                         
backbone=dict(                                                                                                                                                                                                                                   
type='ResNet',                                                                                                                                                                                                                               
depth=50,                                                                                                                                                                                                                                    
num_stages=4,                                                                                                                                                                                                                                
out_indices=(0, 1, 2, 3),                                                                                                                                                                                                                    
frozen_stages=1,                                                                                                                                                                                                                             
style='pytorch'),                                                                                                                                                                                                                        
neck=dict(                                                                                                                                                                                                                                       
type='FPN',                                                                                                                                                                                                                                  
in_channels=[256, 512, 1024, 2048],                                                                                                                                                                                                          
out_channels=256,                                                                                                                                                                                                                            
num_outs=5),                                                                                                                                                                                                                             
rpn_head=dict(                                                                                                                                                                                                                                   
type='RPNHead',                                                                                                                                                                                                                              
in_channels=256,                                                                                                                                                                                                                             
feat_channels=256,                                                                                                                                                                                                                           
anchor_scales=[8],                                                                                                                                                                                                                           
anchor_ratios=[0.5, 1.0, 2.0],                                                                                                                                                                                                               
anchor_strides=[4, 8, 16, 32, 64],                                                                                                                                                                                                           
target_means=[.0, .0, .0, .0],                                                                                                                                                                                                               
target_stds=[1.0, 1.0, 1.0, 1.0],                                                                                                                                                                                                            
loss_cls=dict(                                                                                                                                                                                                                                   
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),                                                                                                                                                                             
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),                                                                                                                                                                   
bbox_roi_extractor=dict(                                                                                                                                                                                                                         
type='SingleRoIExtractor',                                                                                                                                                                                                                   
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),                                                                                                                                                                                   
out_channels=256,                                                                                                                                                                                                                            
featmap_strides=[4, 8, 16, 32]),                                                                                                                                                                                                         
bbox_head=dict(                                                                                                                                                                                                                                  
type='SharedFCBBoxHead',                                                                                                                                                                                                                     
num_fcs=2,                                                                                                                                                                                                                                   
in_channels=256,                                                                                                                                                                                                                             
fc_out_channels=1024,                                                                                                                                                                                                                        
roi_feat_size=7,                                                                                                                                                                                                                             
num_classes=81,                                                                                                                                                                                                                              
target_means=[0., 0., 0., 0.],                                                                                                                                                                                                               
target_stds=[0.1, 0.1, 0.2, 0.2],                                                                                                                                                                                                            
reg_class_agnostic=False,                                                                                                                                                                                                                    
loss_cls=dict(                                                                                                                                                                                                                                   
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),                                                                                                                                                                            
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))                                                                                                                                                                     
# model training and testing settings                                                                                                                                                                                                        
train_cfg = dict(                                                                                                                                                                                                                                
rpn=dict(                                                                                                                                                                                                                                        
assigner=dict(                                                                                                                                                                                                                                   
type='MaxIoUAssigner',                                                                                                                                                                                                                       
pos_iou_thr=0.7,                                                                                                                                                                                                                             
neg_iou_thr=0.3,                                                                                                                                                                                                                             
min_pos_iou=0.3,                                                                                                                                                                                                                             
ignore_iof_thr=-1),                                                                                                                                                                                                                      
sampler=dict(                                                                                                                                                                                                                                    
type='RandomSampler',                                                                                                                                                                                                                        
num=256,                                                                                                                                                                                                                                     
pos_fraction=0.5,                                                                                                                                                                                                                            
neg_pos_ub=-1,                                                                                                                                                                                                                               
add_gt_as_proposals=False),                                                                                                                                                                                                              
allowed_border=0,                                                                                                                                                                                                                            
pos_weight=-1,                                                                                                                                                                                                                               
debug=False),                                                                                                                                                                                                                            
rpn_proposal=dict(                                                                                                                                                                                                                               
nms_across_levels=False,                                                                                                                                                                                                                     
nms_pre=2000,                                                                                                                                                                                                                                
nms_post=2000,                                                                                                                                                                                                                               
max_num=2000,                                                                                                                                                                                                                                
nms_thr=0.7,                                                                                                                                                                                                                                 
min_bbox_size=0),                                                                                                                                                                                                                        
rcnn=dict(                                                                                                                                                                                                                                       
assigner=dict(                                                                                                                                                                                                                                   
type='MaxIoUAssigner',                                                                                                                                                                                                                       
pos_iou_thr=0.5,                                                                                                                                                                                                                             
neg_iou_thr=0.5,                                                                                                                                                                                                                             
min_pos_iou=0.5,                                                                                                                                                                                                                             
ignore_iof_thr=-1),                                                                                                                                                                                                                      
sampler=dict(                                                                                                                                                                                                                                    
type='RandomSampler',                                                                                                                                                                                                                        
num=512,                                                                                                                                                                                                                                     
pos_fraction=0.25,                                                                                                                                                                                                                           
neg_pos_ub=-1,                                                                                                                                                                                                                               
add_gt_as_proposals=True),                                                                                                                                                                                                               
pos_weight=-1,                                                                                                                                                                                                                               
debug=False))                                                                                                                                                                                                                        
test_cfg = dict(                                                                                                                                                                                                                                 
rpn=dict(                                                                                                                                                                                                                                        
nms_across_levels=False,                                                                                                                                                                                                                     
nms_pre=1000,                                                                                                                                                                                                                                
nms_post=1000,                                                                                                                                                                                                                               
max_num=1000,                                                                                                                                                                                                                                
nms_thr=0.7,                                                                                                                                                                                                                                 
min_bbox_size=0),                                                                                                                                                                                                                        
rcnn=dict(                                                                                                                                                                                                                                       
score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)                                                                                                                                                                      
# soft-nms is also supported for rcnn testing                                                                                                                                                                                                
# e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)                                                                                                                                                                           
)                                                                                                                                                                                                                                            
# dataset settings                                                                                                                                                                                                                           
dataset_type = 'CocoDataset'                                                                                                                                                                                                                 
data_root = 'data/coco/'                                                                                                                                                                                                                     
img_norm_cfg = dict(                                                                                                                                                                                                                             
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)                                                                                                                                                                
train_pipeline = [                                                                                                                                                                                                                               
dict(type='LoadImageFromFile'),                                                                                                                                                                                                              
dict(type='LoadAnnotations', with_bbox=True),                                                                                                                                                                                                
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),                                                                                                                                                                                 
dict(type='RandomFlip', flip_ratio=0.5),                                                                                                                                                                                                     
dict(type='Normalize', **img_norm_cfg),                                                                                                                                                                                                      
dict(type='Pad', size_divisor=32),                                                                                                                                                                                                           
dict(type='DefaultFormatBundle'),                                                                                                                                                                                                            
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),                                                                                                                                                                            
]                                                                                                                                                                                                                                            
test_pipeline = [                                                                                                                                                                                                                                
dict(type='LoadImageFromFile'),                                                                                                                                                                                                              
dict(                                                                                                                                                                                                                                            
type='MultiScaleFlipAug',                                                                                                                                                                                                                    
img_scale=(1333, 800),                                                                                                                                                                                                                       
flip=False,                                                                                                                                                                                                                                  
transforms=[                                                                                                                                                                                                                                     
dict(type='Resize', keep_ratio=True),                                                                                                                                                                                                        
dict(type='RandomFlip'),                                                                                                                                                                                                                     
dict(type='Normalize', **img_norm_cfg),                                                                                                                                                                                                      
dict(type='Pad', size_divisor=32),                                                                                                                                                                                                           
dict(type='ImageToTensor', keys=['img']),                                                                                                                                                                                                    
dict(type='Collect', keys=['img']),                                                                                                                                                                                                      
])                                                                                                                                                                                                                                   
]                                                                                                                                                                                                                                            
data = dict(                                                                                                                                                                                                                                     
imgs_per_gpu=2,                                                                                                                                                                                                                              
workers_per_gpu=2,                                                                                                                                                                                                                           
train=dict(                                                                                                                                                                                                                                      
type=dataset_type,                                                                                                                                                                                                                           
ann_file=data_root + 'annotations/instances_train2017.json',                                                                                                                                                                                 
img_prefix=data_root + 'train2017/',                                                                                                                                                                                                         
pipeline=train_pipeline),                                                                                                                                                                                                                
val=dict(                                                                                                                                                                                                                                        
type=dataset_type,                                                                                                                                                                                                                           
ann_file=data_root + 'annotations/instances_val2017.json',                                                                                                                                                                                   
img_prefix=data_root + 'val2017/',                                                                                                                                                                                                           
pipeline=test_pipeline),                                                                                                                                                                                                                 
test=dict(                                                                                                                                                                                                                                       
type=dataset_type,                                                                                                                                                                                                                           
ann_file=data_root + 'annotations/instances_val2017.json',                                                                                                                                                                                   
img_prefix=data_root + 'val2017/',                                                                                                                                                                                                           
pipeline=test_pipeline))                                                                                                                                                                                                             
# optimizer                                                                                                                                                                                                                                  
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)                                                                                                                                                                     
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))                                                                                                                                                                            
# learning policy                                                                                                                                                                                                                            
lr_config = dict(                                                                                                                                                                                                                                
policy='step',                                                                                                                                                                                                                               
warmup='linear',                                                                                                                                                                                                                             
warmup_iters=500,                                                                                                                                                                                                                            
warmup_ratio=1.0 / 3,                                                                                                                                                                                                                        
step=[8, 11])                                                                                                                                                                                                                            
checkpoint_config = dict(interval=1)                                                                                                                                                                                                         
# yapf:disable                                                                                                                                                                                                                               
log_config = dict(                                                                                                                                                                                                                               
interval=50,                                                                                                                                                                                                                                 
hooks=[                                                                                                                                                                                                                                          
dict(type='TextLoggerHook'),                                                                                                                                                                                                                 
# dict(type='TensorboardLoggerHook')                                                                                                                                                                                                     
])                                                                                                                                                                                                                                       
# yapf:enable                                                                                                                                                                                                                                
# runtime settings                                                                                                                                                                                                                           
total_epochs = 12                                                                                                                                                                                                                            
dist_params = dict(backend='nccl')                                                                                                                                                                                                           
log_level = 'INFO'                                                                                                                                                                                                                           
work_dir = './work_dirs/faster_rcnn_r50_fpn_1x'                                                                                                                                                                                              
load_from = None                                                                                                                                                                                                                             
resume_from = None                                                                                                                                                                                                                           
workflow = [('train', 1)]                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
2019-12-12 13:53:50,543 - INFO - load model from: torchvision://resnet50                                                                                                                                                                     
2019-12-12 13:53:50,732 - WARNING - The model and loaded state dict do not match exactly                                                                                                                                                                                                                                                                                                                                                                                                  
unexpected key in source state_dict: fc.weight, fc.bias                                                                                                                                                                                                                                                                                                                                                                                                                                   
loading annotations into memory...                                                                                                                                                                                                           
Done (t=17.69s)                                                                                                                                                                                                                              
creating index...                                                                                                                                                                                                                            
index created!                                                                                                                                                                                                                               
('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic_light', 
'fire_hydrant', 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 
'snowboard', 'sports_ball', 'kite', 'baseball_bat', 'baseball_glove', 'skateboard', 'surfboard', 
'tennis_racket', 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 
'orange', 'broccoli', 'carrot', 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted_plant', 'bed', 
'dining_table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', 'oven', 
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush')                                                                                       
2019-12-12 13:54:14,240 - INFO - Start running, host: aditay@puri, work_dir: /scratche/home/aditay
/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x                                                                                              
2019-12-12 13:54:14,240 - INFO - workflow: [('train', 1)], max: 12 epochs                                                                                                                                                                    
THCudaCheck FAIL file=mmdet/ops/roi_align/src/roi_align_kernel.cu line=139 error=98 : 
unrecognized error code                                                                                                                                
Traceback (most recent call last):                                                                                                                                                                                                             
File "tools/train.py", line 111, in <module>                                                                                                                                                                                                   
main()                                                                                                                                                                                                                                     
File "tools/train.py", line 107, in main                                                                                                                                                                                                       
logger=logger)                                                                                                                                                                                                                             
File "/scratche/home/aditay/mmdetection/mmdet/apis/train.py", line 60, in train_detector                                                                                                                                                       
_non_dist_train(model, dataset, cfg, validate=validate)                                                                                                                                                                                    
File "/scratche/home/aditay/mmdetection/mmdet/apis/train.py", line 232, in _non_dist_train                                                                                                                                                     
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)                                                                                                                                                                                   
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner
/runner.py", line 358, in run                                                                                                                              
epoch_runner(data_loaders[i], **kwargs)                                                                                                                                                                                                    
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner
/runner.py", line 264, in train                                                                                                                            
self.model, data_batch, train_mode=True, **kwargs)                                                                                                                                                                                         
File "/scratche/home/aditay/mmdetection/mmdet/apis/train.py", line 38, in batch_processor                                                                                                                                                      
losses = model(**data)                                                                                                                                                                                                                     
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules
/module.py", line 541, in __call__                                                                                                                    
result = self.forward(*input, **kwargs)                                                                                                                                                                                                    
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel
/data_parallel.py", line 150, in forward                                                                                                             
return self.module(*inputs[0], **kwargs[0])                                                                                                                                                                                                
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules
/module.py", line 541, in __call__                                                                                                                    
result = self.forward(*input, **kwargs)                                                                                                                                                                                                    
File "/scratche/home/aditay/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func                                                                                                                                                   
return old_func(*args, **kwargs)                                                                                                                                                                                                           
File "/scratche/home/aditay/mmdetection/mmdet/models/detectors/base.py", line 117, in forward                                                                                                                                                  
return self.forward_train(img, img_meta, **kwargs)                                                                                                                                                                                         
File "/scratche/home/aditay/mmdetection/mmdet/models/detectors/two_stage.py", line 213, in 
forward_train                                                                                                                                       
x[:self.bbox_roi_extractor.num_inputs], rois)                                                                                                                                                                                              
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules
/module.py", line 541, in __call__                                                                                                                    
result = self.forward(*input, **kwargs)                                                                                                                                                                                                    
File "/scratche/home/aditay/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func                                                                                                                                                  
return old_func(*args, **kwargs)                                                                                                                                                                                                           
File "/scratche/home/aditay/mmdetection/mmdet/models/roi_extractors/single_level.py", line 105, 
in forward                                                                                                                                     
roi_feats_t = self.roi_layers[i](feats[i], rois_)                                                                                                                                                                                          
File "/home/aditay/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules
/module.py", line 541, in __call__                                                                                                                    
result = self.forward(*input, **kwargs)                                                                                                                                                                                                    
File "/scratche/home/aditay/mmdetection/mmdet/ops/roi_align/roi_align.py", line 80, in forward                                                                                                                                                 
self.sample_num)                                                                                                                                                                                                                           
File "/scratche/home/aditay/mmdetection/mmdet/ops/roi_align/roi_align.py", line 26, in forward                                                                                                                                                 
sample_num, output)                                                                                                                                                                                                                      
RuntimeError: cuda runtime error (98) : unrecognized error code at mmdet/ops/roi_align
/src/roi_align_kernel.cu:139                                                                                                                           
Segmentation fault (core dumped)
ZwwWayne commented 4 years ago

Hi @IISCAditayTripathi , Please use the Error Template. And check former issue https://github.com/open-mmlab/mmdetection/issues/24, https://github.com/open-mmlab/mmdetection/issues/229.