open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.61k stars 9.47k forks source link

模型测试进度完成后,没有显示ap列表,而是直接被杀死killed #5909

Closed sanmulab closed 3 years ago

sanmulab commented 3 years ago

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 19809/19809, 8.4 task/s, elapsed: 2364s, ETA: 0s2021-08-18 08:48:11,286 - mmdet - INFO - Evaluating bbox... [08/18 08:48:11] mmdet INFO: Evaluating bbox... Killed

env: CUDA=11.1 pytorch=1.8.1 mmdet=2.12 mmcv-full=1.3.8

我用scnet训练LVIS1.0数据集,训练成功后进行测试--eval bbox segm,结果进度完成后没有显示结果 data_root = 'data/lvis_v1' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict( type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='SegRescale', scale_factor=1 / 8), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ]

data_root = 'data/lvis_v1' # change this to your own path data = dict( samples_per_gpu=1, workers_per_gpu=1, train=dict( type='ClassBalancedDataset', oversample_thr=0.001, seg_prefix=data_root + 'stuffthingmaps/train2017/', dataset=dict( type='LVISV1Dataset', ann_file='annotations/lvis_v1_train.json', img_prefix='', pipeline=[ dict(type='LoadImageFromFile'), dict( type='LoadAnnotations', with_bbox=True, with_mask=True, poly2mask=False), dict( type='Resize', img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], multiscale_mode='value', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ], data_root=data_root) ), val=dict( type='LVISV1Dataset', ann_file='annotations/lvis_v1_val.json', img_prefix='', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], data_root=data_root), test=dict( type='LVISV1Dataset', ann_file='annotations/lvis_v1_val.json', img_prefix='', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], data_root=data_root))

evaluation = dict(metric=['bbox', 'segm'], classwise=True, interval=12)

AronLin commented 3 years ago

It might be killed by the system if the memory runs out or too many threads are used. Can you check it?

There is no such problem when using the coco data set in the codebase.

sanmulab commented 3 years ago

My memory has 24 g, and no other programs are running. I test the lvis1.0 data set. Why was it killed after the progress was completed.

AronLin commented 3 years ago

Can you try our latest version codebase?

sanmulab commented 3 years ago

Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.

sanmulab commented 3 years ago

/var/log/auth.log:Aug 18 14:54:43 ubuntu sudo: ubuntu : TTY=pts/0 ; PWD=/home/ubuntu/chensen/test-RefineMask ; USER=root ; COMMAND=/usr/bin/egrep -i -r 604639 /var/log /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.474582] CPU: 12 PID: 604639 Comm: python Tainted: P OE 5.11.0-25-generic #27~20.04.1-Ubuntu /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.475053] [ 604639] 1000 604639 11024416 6917133 62283776 0 0 python /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.475056] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-59.scope,task=python,pid=604639,uid=1000 /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.475087] Out of memory: Killed process 604639 (python) total-vm:44097664kB, anon-rss:27583492kB, file-rss:68812kB, shmem-rss:16228kB, UID:1000 pgtables:60824kB oom_score_adj:0 /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.914589] oom_reaper: reaped process 604639 (python), now anon-rss:0kB, file-rss:68876kB, shmem-rss:16228kB /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.474582] CPU: 12 PID: 604639 Comm: python Tainted: P OE 5.11.0-25-generic #27~20.04.1-Ubuntu /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.475053] [ 604639] 1000 604639 11024416 6917133 62283776 0 0 python /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.475056] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-59.scope,task=python,pid=604639,uid=1000 /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.475087] Out of memory: Killed process 604639 (python) total-vm:44097664kB, anon-rss:27583492kB, file-rss:68812kB, shmem-rss:16228kB, UID:1000 pgtables:60824kB oom_score_adj:0 /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.914589] oom_reaper: reaped process 604639 (python), now anon-rss:0kB, file-rss:68876kB, shmem-rss:16228kB

So how to solve this problem?

sanmulab commented 3 years ago

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 19809/19809, 8.4 task/s, elapsed: 2364s, ETA: 0s Evaluating bbox... Killed

I retested and found that every time I came to the evaluating bbox link, the memory was consumed madly, and 30g of memory was not enough for the test? Is the model too large, or is the LVIS 1.0 dataset too large?

AronLin commented 3 years ago

Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.

Just use torch that meets mmdet 2.3.

AronLin commented 3 years ago

We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?

sanmulab commented 3 years ago

Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.

Just use torch that meets mmdet 2.3.

My current pytorch = 1.8.1. Do I need to lower the version if I want to install mmdet2.3.0? Or as long as mmcv full meets the requirements. I'm worried that CUDA version is too high and incompatible.

AronLin commented 3 years ago

Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.

Just use torch that meets mmdet 2.3.

My current pytorch = 1.8.1. Do I need to lower the version if I want to install mmdet2.3.0? Or as long as mmcv full meets the requirements. I'm worried that CUDA version is too high and incompatible.

As shown in get_started.md, mmdet 2.3 needs mmcv-full==1.0.5. In my memory, this version of MMCV supports PyTorch 1.8.1. So you do not need to get a lower version of Pytorch.

sanmulab commented 3 years ago

Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.

Just use torch that meets mmdet 2.3.

My current pytorch = 1.8.1. Do I need to lower the version if I want to install mmdet2.3.0? Or as long as mmcv full meets the requirements. I'm worried that CUDA version is too high and incompatible.

As shown in get_started.md, mmdet 2.3 needs mmcv-full==1.0.5. In my memory, this version of MMCV supports PyTorch 1.8.1. So you do not need to get a lower version of Pytorch.

thank you!

sanmulab commented 3 years ago

We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?

He!I used the latest version of mmdet = 2.15 to test the lvisv1.0 dataset, and used the officially downloaded lvis1.0(mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth) model and default configuration for evaluation and testing, but there will still be insufficient memory leading to the killing of the process. So I want to know how much memory is needed to test LVIs 1.0 model, and why I still don't have enough memory in 30g.

AronLin commented 3 years ago

We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?

He!I used the latest version of mmdet = 2.15 to test the lvisv1.0 dataset, and used the officially downloaded lvis1.0(mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth) model and default configuration for evaluation and testing, but there will still be insufficient memory leading to the killing of the process. So I want to know how much memory is needed to test LVIs 1.0 model, and why I still don't have enough memory in 30g.

I tried to run evalutation on lvis_v1 and found that sometimes the RES( resident memory usage) is sometimes more than 0.036t, so 30g memory is obviously not enough.

sanmulab commented 3 years ago

Thanks !!!

Henning742 commented 3 years ago

We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?

He!I used the latest version of mmdet = 2.15 to test the lvisv1.0 dataset, and used the officially downloaded lvis1.0(mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth) model and default configuration for evaluation and testing, but there will still be insufficient memory leading to the killing of the process. So I want to know how much memory is needed to test LVIs 1.0 model, and why I still don't have enough memory in 30g.

I tried to run evalutation on lvis_v1 and found that sometimes the RES( resident memory usage) is sometimes more than 0.036t, so 30g memory is obviously not enough.

Hi, can you shine some light on why it needs so much RES during the evaluation phase? Is it keeping all the detection results before they get compared to GTs? Is there a way to reduce RES usage?