open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.21k stars 9.4k forks source link

Why mmdetection run so slowly? #4090

Closed tianxianhao closed 3 years ago

tianxianhao commented 3 years ago

I promise that i have install mmdetection 2.6.0 correctly. I download the coco dataset and put it at a correct position. Then I try to train them with a faster rcnn model. The config file that I used is ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py There are 2 unoccupied Tesla P100 GPUs on my linux computer. And I set run as the tutorial file. Then I find that 12 epochs need 2 days for training a coco-faster_rcnn? It is so slowly. When I test coco-faster_rcnn on Detetron2, and 100,000 epochs only takes 1days.

It that something wrong? I paste the running log here. Please help me find the problem. [log] (mmdet) [tianxianhao@localhost mmdetection]$ CUDA_VISIBLE_DEVICES=4,7 PORT=29502 ./tools/dist_train.sh ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 2


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


2020-11-11 18:12:27,681 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] CUDA available: True GPU 0,1: Tesla P100-PCIE-16GB CUDA_HOME: /home/tianxianhao/cuda/cuda-10.1 NVCC: Cuda compilation tools, release 10.1, V10.1.105 GCC: gcc (GCC) 6.1.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.7.0 OpenCV: 4.4.0 MMCV: 1.1.6 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMDetection: 2.6.0+d3cf38d

2020-11-11 18:12:28,850 - mmdet - INFO - Distributed training: True 2020-11-11 18:12:29,918 - mmdet - INFO - Config: model = dict( type='FasterRCNN', pretrained='torchvision://resnet50', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)))) train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/home/tianxianhao/Project/dataset/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file= '/home/tianxianhao/Project/dataset/coco/annotations/instances_train2017.json', img_prefix='/home/tianxianhao/Project/dataset/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file= '/home/tianxianhao/Project/dataset/coco/annotations/instances_val2017.json', img_prefix='/home/tianxianhao/Project/dataset/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file= '/home/tianxianhao/Project/dataset/coco/annotations/instances_val2017.json', img_prefix='/home/tianxianhao/Project/dataset/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) total_epochs = 12 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_coco' gpu_ids = range(0, 1)

2020-11-11 18:12:30,419 - mmdet - INFO - load model from: torchvision://resnet50 2020-11-11 18:12:34,730 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

loading annotations into memory... loading annotations into memory... Done (t=19.03s) creating index... Done (t=19.47s) creating index... index created! index created! loading annotations into memory... loading annotations into memory... Done (t=0.48s) creating index... Done (t=0.50s) creating index... index created! index created! 2020-11-11 18:12:57,785 - mmdet - INFO - Start running, host: tianxianhao@localhost.localdomain, work_dir: /home/tianxianhao/Project/mmdet/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x_coco 2020-11-11 18:12:57,785 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs 2020-11-11 18:14:05,472 - mmdet - INFO - Epoch [1][50/29317] lr: 1.978e-03, eta: 5 days, 12:07:55, time: 1.352, data_time: 0.962, memory: 3946, loss_rpn_cls: 0.4892, loss_rpn_bbox: 0.1027, loss_cls: 1.2770, acc: 86.8262, loss_bbox: 0.0748, loss: 1.9436 2020-11-11 18:14:25,511 - mmdet - INFO - Epoch [1][100/29317] lr: 3.976e-03, eta: 3 days, 13:38:34, time: 0.401, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.2052, loss_rpn_bbox: 0.0922, loss_cls: 0.5193, acc: 94.3135, loss_bbox: 0.1882, loss: 1.0050 2020-11-11 18:14:45,086 - mmdet - INFO - Epoch [1][150/29317] lr: 5.974e-03, eta: 2 days, 21:50:25, time: 0.392, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1865, loss_rpn_bbox: 0.0958, loss_cls: 0.5175, acc: 93.7412, loss_bbox: 0.2166, loss: 1.0163 2020-11-11 18:15:04,847 - mmdet - INFO - Epoch [1][200/29317] lr: 7.972e-03, eta: 2 days, 14:01:04, time: 0.395, data_time: 0.008, memory: 3946, loss_rpn_cls: 0.1733, loss_rpn_bbox: 0.0990, loss_cls: 0.5281, acc: 93.6982, loss_bbox: 0.2182, loss: 1.0187 2020-11-11 18:15:24,613 - mmdet - INFO - Epoch [1][250/29317] lr: 9.970e-03, eta: 2 days, 9:19:39, time: 0.395, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1406, loss_rpn_bbox: 0.0987, loss_cls: 0.4903, acc: 92.5166, loss_bbox: 0.2653, loss: 0.9949 2020-11-11 18:15:44,679 - mmdet - INFO - Epoch [1][300/29317] lr: 1.197e-02, eta: 2 days, 6:17:29, time: 0.401, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.1453, loss_rpn_bbox: 0.0983, loss_cls: 0.4905, acc: 91.9990, loss_bbox: 0.2836, loss: 1.0178 2020-11-11 18:16:04,421 - mmdet - INFO - Epoch [1][350/29317] lr: 1.397e-02, eta: 2 days, 4:02:21, time: 0.395, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1417, loss_rpn_bbox: 0.0866, loss_cls: 0.4987, acc: 92.8105, loss_bbox: 0.2503, loss: 0.9773 2020-11-11 18:16:24,075 - mmdet - INFO - Epoch [1][400/29317] lr: 1.596e-02, eta: 2 days, 2:19:32, time: 0.393, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1553, loss_rpn_bbox: 0.0995, loss_cls: 0.5538, acc: 91.3877, loss_bbox: 0.3112, loss: 1.1197 2020-11-11 18:16:43,777 - mmdet - INFO - Epoch [1][450/29317] lr: 1.796e-02, eta: 2 days, 1:00:00, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1534, loss_rpn_bbox: 0.1105, loss_cls: 0.5359, acc: 91.7832, loss_bbox: 0.2857, loss: 1.0856 2020-11-11 18:17:03,508 - mmdet - INFO - Epoch [1][500/29317] lr: 1.996e-02, eta: 1 day, 23:56:31, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1374, loss_rpn_bbox: 0.0776, loss_cls: 0.5404, acc: 91.8213, loss_bbox: 0.2898, loss: 1.0452 2020-11-11 18:17:23,201 - mmdet - INFO - Epoch [1][550/29317] lr: 2.000e-02, eta: 1 day, 23:04:25, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1524, loss_rpn_bbox: 0.0958, loss_cls: 0.5079, acc: 91.9209, loss_bbox: 0.2797, loss: 1.0359 2020-11-11 18:17:42,829 - mmdet - INFO - Epoch [1][600/29317] lr: 2.000e-02, eta: 1 day, 22:20:10, time: 0.393, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1245, loss_rpn_bbox: 0.0970, loss_cls: 0.5349, acc: 91.5957, loss_bbox: 0.2916, loss: 1.0480 2020-11-11 18:18:02,768 - mmdet - INFO - Epoch [1][650/29317] lr: 2.000e-02, eta: 1 day, 21:45:17, time: 0.398, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1168, loss_rpn_bbox: 0.0777, loss_cls: 0.5336, acc: 91.9971, loss_bbox: 0.2772, loss: 1.0053 2020-11-11 18:18:22,624 - mmdet - INFO - Epoch [1][700/29317] lr: 2.000e-02, eta: 1 day, 21:14:55, time: 0.397, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1257, loss_rpn_bbox: 0.0887, loss_cls: 0.4711, acc: 92.2227, loss_bbox: 0.2671, loss: 0.9525 2020-11-11 18:18:42,451 - mmdet - INFO - Epoch [1][750/29317] lr: 2.000e-02, eta: 1 day, 20:48:17, time: 0.397, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.1387, loss_rpn_bbox: 0.0880, loss_cls: 0.4921, acc: 91.8867, loss_bbox: 0.2841, loss: 1.0029 2020-11-11 18:19:02,397 - mmdet - INFO - Epoch [1][800/29317] lr: 2.000e-02, eta: 1 day, 20:25:46, time: 0.399, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1405, loss_rpn_bbox: 0.0829, loss_cls: 0.5112, acc: 92.1338, loss_bbox: 0.2732, loss: 1.0078 2020-11-11 18:19:22,200 - mmdet - INFO - Epoch [1][850/29317] lr: 2.000e-02, eta: 1 day, 20:04:53, time: 0.396, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1225, loss_rpn_bbox: 0.0941, loss_cls: 0.5410, acc: 90.8604, loss_bbox: 0.3200, loss: 1.0776 2020-11-11 18:19:41,969 - mmdet - INFO - Epoch [1][900/29317] lr: 2.000e-02, eta: 1 day, 19:45:56, time: 0.395, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1022, loss_rpn_bbox: 0.0884, loss_cls: 0.5482, acc: 90.6328, loss_bbox: 0.3303, loss: 1.0691 2020-11-11 18:20:01,713 - mmdet - INFO - Epoch [1][950/29317] lr: 2.000e-02, eta: 1 day, 19:28:59, time: 0.395, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1103, loss_rpn_bbox: 0.0904, loss_cls: 0.5044, acc: 91.3711, loss_bbox: 0.2995, loss: 1.0045 2020-11-11 18:20:21,456 - mmdet - INFO - Exp name: faster_rcnn_r50_fpn_1x_coco.py 2020-11-11 18:20:21,457 - mmdet - INFO - Epoch [1][1000/29317] lr: 2.000e-02, eta: 1 day, 19:13:38, time: 0.395, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1116, loss_rpn_bbox: 0.0942, loss_cls: 0.5498, acc: 89.8662, loss_bbox: 0.3523, loss: 1.1078 2020-11-11 18:20:40,841 - mmdet - INFO - Epoch [1][1050/29317] lr: 2.000e-02, eta: 1 day, 18:57:37, time: 0.387, data_time: 0.008, memory: 3946, loss_rpn_cls: 0.1105, loss_rpn_bbox: 0.0975, loss_cls: 0.4957, acc: 91.4482, loss_bbox: 0.2994, loss: 1.0031 2020-11-11 18:21:00,468 - mmdet - INFO - Epoch [1][1100/29317] lr: 2.000e-02, eta: 1 day, 18:44:28, time: 0.393, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1194, loss_rpn_bbox: 0.0822, loss_cls: 0.5477, acc: 90.7676, loss_bbox: 0.3243, loss: 1.0736 2020-11-11 18:21:20,083 - mmdet - INFO - Epoch [1][1150/29317] lr: 2.000e-02, eta: 1 day, 18:32:17, time: 0.392, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0969, loss_rpn_bbox: 0.0887, loss_cls: 0.5287, acc: 90.5693, loss_bbox: 0.3296, loss: 1.0439 2020-11-11 18:21:39,969 - mmdet - INFO - Epoch [1][1200/29317] lr: 2.000e-02, eta: 1 day, 18:22:26, time: 0.398, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1171, loss_rpn_bbox: 0.0973, loss_cls: 0.5065, acc: 90.6416, loss_bbox: 0.3247, loss: 1.0456 2020-11-11 18:21:59,852 - mmdet - INFO - Epoch [1][1250/29317] lr: 2.000e-02, eta: 1 day, 18:13:10, time: 0.397, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0949, loss_rpn_bbox: 0.0808, loss_cls: 0.4891, acc: 90.6943, loss_bbox: 0.3254, loss: 0.9902 2020-11-11 18:22:19,328 - mmdet - INFO - Epoch [1][1300/29317] lr: 2.000e-02, eta: 1 day, 18:03:03, time: 0.390, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1063, loss_rpn_bbox: 0.0828, loss_cls: 0.4949, acc: 91.2100, loss_bbox: 0.3034, loss: 0.9874 2020-11-11 18:22:39,186 - mmdet - INFO - Epoch [1][1350/29317] lr: 2.000e-02, eta: 1 day, 17:55:10, time: 0.397, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1190, loss_rpn_bbox: 0.0929, loss_cls: 0.4949, acc: 90.5898, loss_bbox: 0.3172, loss: 1.0240 2020-11-11 18:22:58,622 - mmdet - INFO - Epoch [1][1400/29317] lr: 2.000e-02, eta: 1 day, 17:46:01, time: 0.388, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1109, loss_rpn_bbox: 0.0813, loss_cls: 0.4610, acc: 91.4648, loss_bbox: 0.2915, loss: 0.9447 2020-11-11 18:23:18,137 - mmdet - INFO - Epoch [1][1450/29317] lr: 2.000e-02, eta: 1 day, 17:37:52, time: 0.390, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1108, loss_rpn_bbox: 0.0880, loss_cls: 0.5491, acc: 89.6650, loss_bbox: 0.3378, loss: 1.0857 2020-11-11 18:23:37,600 - mmdet - INFO - Epoch [1][1500/29317] lr: 2.000e-02, eta: 1 day, 17:30:02, time: 0.389, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1012, loss_rpn_bbox: 0.0891, loss_cls: 0.5193, acc: 90.0107, loss_bbox: 0.3369, loss: 1.0466 2020-11-11 18:23:57,143 - mmdet - INFO - Epoch [1][1550/29317] lr: 2.000e-02, eta: 1 day, 17:22:52, time: 0.390, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1033, loss_rpn_bbox: 0.0832, loss_cls: 0.4847, acc: 90.9316, loss_bbox: 0.3058, loss: 0.9770 2020-11-11 18:24:16,917 - mmdet - INFO - Epoch [1][1600/29317] lr: 2.000e-02, eta: 1 day, 17:17:05, time: 0.396, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.0979, loss_rpn_bbox: 0.0855, loss_cls: 0.5189, acc: 90.5537, loss_bbox: 0.3225, loss: 1.0249 2020-11-11 18:24:36,446 - mmdet - INFO - Epoch [1][1650/29317] lr: 2.000e-02, eta: 1 day, 17:10:44, time: 0.391, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1062, loss_rpn_bbox: 0.0891, loss_cls: 0.4788, acc: 90.3750, loss_bbox: 0.3311, loss: 1.0052 2020-11-11 18:24:56,239 - mmdet - INFO - Epoch [1][1700/29317] lr: 2.000e-02, eta: 1 day, 17:05:45, time: 0.396, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.1180, loss_rpn_bbox: 0.0929, loss_cls: 0.5231, acc: 89.9219, loss_bbox: 0.3336, loss: 1.0675 2020-11-11 18:25:15,731 - mmdet - INFO - Epoch [1][1750/29317] lr: 2.000e-02, eta: 1 day, 16:59:57, time: 0.390, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1085, loss_rpn_bbox: 0.0934, loss_cls: 0.4766, acc: 90.9072, loss_bbox: 0.2986, loss: 0.9771 2020-11-11 18:25:35,096 - mmdet - INFO - Epoch [1][1800/29317] lr: 2.000e-02, eta: 1 day, 16:54:00, time: 0.387, data_time: 0.008, memory: 3946, loss_rpn_cls: 0.0951, loss_rpn_bbox: 0.0832, loss_cls: 0.4664, acc: 90.6221, loss_bbox: 0.3104, loss: 0.9550 2020-11-11 18:25:54,678 - mmdet - INFO - Epoch [1][1850/29317] lr: 2.000e-02, eta: 1 day, 16:49:06, time: 0.392, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0886, loss_rpn_bbox: 0.0784, loss_cls: 0.4678, acc: 90.9453, loss_bbox: 0.2990, loss: 0.9338 2020-11-11 18:26:14,218 - mmdet - INFO - Epoch [1][1900/29317] lr: 2.000e-02, eta: 1 day, 16:44:17, time: 0.391, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0812, loss_rpn_bbox: 0.0802, loss_cls: 0.4572, acc: 90.3184, loss_bbox: 0.3203, loss: 0.9389 2020-11-11 18:26:33,887 - mmdet - INFO - Epoch [1][1950/29317] lr: 2.000e-02, eta: 1 day, 16:40:04, time: 0.393, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0970, loss_rpn_bbox: 0.0760, loss_cls: 0.4719, acc: 90.4404, loss_bbox: 0.3174, loss: 0.9622 2020-11-11 18:26:53,579 - mmdet - INFO - Exp name: faster_rcnn_r50_fpn_1x_coco.py 2020-11-11 18:26:53,579 - mmdet - INFO - Epoch [1][2000/29317] lr: 2.000e-02, eta: 1 day, 16:36:07, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0885, loss_rpn_bbox: 0.0796, loss_cls: 0.4688, acc: 90.6924, loss_bbox: 0.3094, loss: 0.9463 2020-11-11 18:27:13,445 - mmdet - INFO - Epoch [1][2050/29317] lr: 2.000e-02, eta: 1 day, 16:32:52, time: 0.397, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1139, loss_rpn_bbox: 0.0861, loss_cls: 0.4787, acc: 89.9824, loss_bbox: 0.3393, loss: 1.0181 2020-11-11 18:27:32,886 - mmdet - INFO - Epoch [1][2100/29317] lr: 2.000e-02, eta: 1 day, 16:28:31, time: 0.389, data_time: 0.008, memory: 3946, loss_rpn_cls: 0.0930, loss_rpn_bbox: 0.0820, loss_cls: 0.4790, acc: 90.0488, loss_bbox: 0.3219, loss: 0.9760 2020-11-11 18:27:52,420 - mmdet - INFO - Epoch [1][2150/29317] lr: 2.000e-02, eta: 1 day, 16:24:41, time: 0.391, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1059, loss_rpn_bbox: 0.0825, loss_cls: 0.4929, acc: 90.0684, loss_bbox: 0.3225, loss: 1.0037 2020-11-11 18:28:12,003 - mmdet - INFO - Epoch [1][2200/29317] lr: 2.000e-02, eta: 1 day, 16:21:04, time: 0.391, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1057, loss_rpn_bbox: 0.0896, loss_cls: 0.4423, acc: 90.4268, loss_bbox: 0.3018, loss: 0.9392 2020-11-11 18:28:31,705 - mmdet - INFO - Epoch [1][2250/29317] lr: 2.000e-02, eta: 1 day, 16:17:58, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0953, loss_rpn_bbox: 0.0819, loss_cls: 0.4910, acc: 90.2373, loss_bbox: 0.3118, loss: 0.9800 2020-11-11 18:28:51,400 - mmdet - INFO - Epoch [1][2300/29317] lr: 2.000e-02, eta: 1 day, 16:14:54, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0973, loss_rpn_bbox: 0.0872, loss_cls: 0.4807, acc: 89.8252, loss_bbox: 0.3237, loss: 0.9889 2020-11-11 18:29:10,815 - mmdet - INFO - Epoch [1][2350/29317] lr: 2.000e-02, eta: 1 day, 16:11:19, time: 0.388, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0918, loss_rpn_bbox: 0.0765, loss_cls: 0.4659, acc: 90.2236, loss_bbox: 0.3173, loss: 0.9516 2020-11-11 18:29:30,554 - mmdet - INFO - Epoch [1][2400/29317] lr: 2.000e-02, eta: 1 day, 16:08:36, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0970, loss_rpn_bbox: 0.0928, loss_cls: 0.4710, acc: 89.9209, loss_bbox: 0.3128, loss: 0.9736 2020-11-11 18:29:49,943 - mmdet - INFO - Epoch [1][2450/29317] lr: 2.000e-02, eta: 1 day, 16:05:13, time: 0.388, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.0952, loss_rpn_bbox: 0.0886, loss_cls: 0.4633, acc: 90.0342, loss_bbox: 0.3123, loss: 0.9594 2020-11-11 18:30:09,437 - mmdet - INFO - Epoch [1][2500/29317] lr: 2.000e-02, eta: 1 day, 16:02:08, time: 0.389, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0885, loss_rpn_bbox: 0.0866, loss_cls: 0.4695, acc: 89.8945, loss_bbox: 0.3300, loss: 0.9746 2020-11-11 18:30:28,918 - mmdet - INFO - Epoch [1][2550/29317] lr: 2.000e-02, eta: 1 day, 15:59:14, time: 0.390, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.1072, loss_rpn_bbox: 0.0984, loss_cls: 0.4598, acc: 90.3008, loss_bbox: 0.3192, loss: 0.9846 2020-11-11 18:30:48,685 - mmdet - INFO - Epoch [1][2600/29317] lr: 2.000e-02, eta: 1 day, 15:57:00, time: 0.395, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0871, loss_rpn_bbox: 0.0787, loss_cls: 0.4451, acc: 90.3242, loss_bbox: 0.3153, loss: 0.9261 2020-11-11 18:31:08,402 - mmdet - INFO - Epoch [1][2650/29317] lr: 2.000e-02, eta: 1 day, 15:54:44, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0919, loss_rpn_bbox: 0.0840, loss_cls: 0.4618, acc: 89.9150, loss_bbox: 0.3238, loss: 0.9614 2020-11-11 18:31:27,639 - mmdet - INFO - Epoch [1][2700/29317] lr: 2.000e-02, eta: 1 day, 15:51:27, time: 0.384, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0838, loss_rpn_bbox: 0.0846, loss_cls: 0.4451, acc: 90.0732, loss_bbox: 0.3223, loss: 0.9358 2020-11-11 18:31:47,005 - mmdet - INFO - Epoch [1][2750/29317] lr: 2.000e-02, eta: 1 day, 15:48:36, time: 0.387, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0877, loss_rpn_bbox: 0.0789, loss_cls: 0.4438, acc: 90.7510, loss_bbox: 0.2939, loss: 0.9043 2020-11-11 18:32:06,614 - mmdet - INFO - Epoch [1][2800/29317] lr: 2.000e-02, eta: 1 day, 15:46:24, time: 0.393, data_time: 0.010, memory: 3946, loss_rpn_cls: 0.1146, loss_rpn_bbox: 0.0867, loss_cls: 0.4405, acc: 90.4053, loss_bbox: 0.3096, loss: 0.9514 2020-11-11 18:32:26,195 - mmdet - INFO - Epoch [1][2850/29317] lr: 2.000e-02, eta: 1 day, 15:44:07, time: 0.391, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0730, loss_rpn_bbox: 0.0790, loss_cls: 0.4132, acc: 90.9248, loss_bbox: 0.2942, loss: 0.8594 2020-11-11 18:32:46,132 - mmdet - INFO - Epoch [1][2900/29317] lr: 2.000e-02, eta: 1 day, 15:42:41, time: 0.399, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0912, loss_rpn_bbox: 0.0878, loss_cls: 0.4451, acc: 89.7891, loss_bbox: 0.3227, loss: 0.9469 2020-11-11 18:33:05,799 - mmdet - INFO - Epoch [1][2950/29317] lr: 2.000e-02, eta: 1 day, 15:40:44, time: 0.393, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0863, loss_rpn_bbox: 0.0843, loss_cls: 0.4392, acc: 90.3037, loss_bbox: 0.3089, loss: 0.9187 2020-11-11 18:33:25,271 - mmdet - INFO - Exp name: faster_rcnn_r50_fpn_1x_coco.py 2020-11-11 18:33:25,271 - mmdet - INFO - Epoch [1][3000/29317] lr: 2.000e-02, eta: 1 day, 15:38:26, time: 0.389, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0826, loss_rpn_bbox: 0.0761, loss_cls: 0.4186, acc: 90.6494, loss_bbox: 0.3015, loss: 0.8789 2020-11-11 18:33:44,932 - mmdet - INFO - Epoch [1][3050/29317] lr: 2.000e-02, eta: 1 day, 15:36:35, time: 0.393, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.0980, loss_rpn_bbox: 0.0881, loss_cls: 0.4566, acc: 89.8672, loss_bbox: 0.3262, loss: 0.9689 2020-11-11 18:34:04,652 - mmdet - INFO - Epoch [1][3100/29317] lr: 2.000e-02, eta: 1 day, 15:34:52, time: 0.394, data_time: 0.009, memory: 3946, loss_rpn_cls: 0.1036, loss_rpn_bbox: 0.0968, loss_cls: 0.4555, acc: 89.6064, loss_bbox: 0.3425, loss: 0.9983

tianxianhao commented 3 years ago

(mmdet) [tianxianhao@localhost mmdet]$ nvidia-smi Wed Nov 11 18:57:36 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 | | N/A 52C P0 163W / 250W | 9727MiB / 16280MiB | 76% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE... Off | 00000000:05:00.0 Off | 0 | | N/A 54C P0 201W / 250W | 9361MiB / 16280MiB | 70% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla P100-PCIE... Off | 00000000:08:00.0 Off | 0 | | N/A 56C P0 57W / 250W | 9725MiB / 16280MiB | 94% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla P100-PCIE... Off | 00000000:09:00.0 Off | 0 | | N/A 54C P0 47W / 250W | 9357MiB / 16280MiB | 95% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla P100-PCIE... Off | 00000000:85:00.0 Off | 0 | | N/A 55C P0 175W / 250W | 5923MiB / 16280MiB | 83% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 | | N/A 53C P0 71W / 250W | 9717MiB / 16280MiB | 97% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla P100-PCIE... Off | 00000000:89:00.0 Off | 0 | | N/A 57C P0 59W / 250W | 9367MiB / 16280MiB | 97% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla P100-PCIE... Off | 00000000:8A:00.0 Off | 0 | | N/A 56C P0 188W / 250W | 5747MiB / 16280MiB | 88% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 69975 C ...ator/my_ParisStreetView/venv/bin/python 9717MiB | | 1 69975 C ...ator/my_ParisStreetView/venv/bin/python 9351MiB | | 2 28308 C ...ator/my_ParisStreetView/venv/bin/python 9715MiB | | 3 28308 C ...ator/my_ParisStreetView/venv/bin/python 9347MiB | | 4 51580 C ...xianhao/anaconda3/envs/mmdet/bin/python 5913MiB | | 5 54555 C ...ator/my_ParisStreetView/venv/bin/python 9707MiB | | 6 54555 C ...ator/my_ParisStreetView/venv/bin/python 9357MiB | | 7 51581 C ...xianhao/anaconda3/envs/mmdet/bin/python 5737MiB | +-----------------------------------------------------------------------------+

ZwwWayne commented 3 years ago

The iteration time looks fine. The total training time is long because you only use 2 GPU to train the model, if you use 8 GPU, the training should only take about 12h.

tianxianhao commented 3 years ago

The iteration time looks fine. The total training time is long because you only use 2 GPU to train the model, if you use 8 GPU, the training should only take about 12h.

The problem is that the gap of trainning speed between Detectron2 and MMdetection is too large. Is the evaluation cost extra time? I try to set the evaluation.interval=1000, but it seem not work.

tianxianhao commented 3 years ago

The iteration time looks fine. The total training time is long because you only use 2 GPU to train the model, if you use 8 GPU, the training should only take about 12h.

API tool | Detectron2 | MMdetection use GPUs | 2 | 2 epochs | 100000 | 12 time cost | 24hours | 2days

you sure the traning time is ok?

tianxianhao commented 3 years ago

Sorry , I check the detectron2 log file, It is took 24hour / 1 epoch .

tianxianhao commented 3 years ago

[11/09 13:52:15] d2.engine.hooks INFO: Overall training speed: 89998 iterations in 1 day, 0:04:36 (0.9631 s / it) [11/09 13:52:15] d2.engine.hooks INFO: Total training time: 1 day, 0:05:51 (0:01:14 on hooks)