Closed sanmulab closed 3 years ago
It might be killed by the system if the memory runs out or too many threads are used. Can you check it?
There is no such problem when using the coco data set in the codebase.
My memory has 24 g, and no other programs are running. I test the lvis1.0 data set. Why was it killed after the progress was completed.
Can you try our latest version codebase?
Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.
/var/log/auth.log:Aug 18 14:54:43 ubuntu sudo: ubuntu : TTY=pts/0 ; PWD=/home/ubuntu/chensen/test-RefineMask ; USER=root ; COMMAND=/usr/bin/egrep -i -r 604639 /var/log /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.474582] CPU: 12 PID: 604639 Comm: python Tainted: P OE 5.11.0-25-generic #27~20.04.1-Ubuntu /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.475053] [ 604639] 1000 604639 11024416 6917133 62283776 0 0 python /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.475056] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-59.scope,task=python,pid=604639,uid=1000 /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.475087] Out of memory: Killed process 604639 (python) total-vm:44097664kB, anon-rss:27583492kB, file-rss:68812kB, shmem-rss:16228kB, UID:1000 pgtables:60824kB oom_score_adj:0 /var/log/kern.log:Aug 18 14:30:16 ubuntu kernel: [356972.914589] oom_reaper: reaped process 604639 (python), now anon-rss:0kB, file-rss:68876kB, shmem-rss:16228kB /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.474582] CPU: 12 PID: 604639 Comm: python Tainted: P OE 5.11.0-25-generic #27~20.04.1-Ubuntu /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.475053] [ 604639] 1000 604639 11024416 6917133 62283776 0 0 python /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.475056] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-59.scope,task=python,pid=604639,uid=1000 /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.475087] Out of memory: Killed process 604639 (python) total-vm:44097664kB, anon-rss:27583492kB, file-rss:68812kB, shmem-rss:16228kB, UID:1000 pgtables:60824kB oom_score_adj:0 /var/log/syslog:Aug 18 14:30:16 ubuntu kernel: [356972.914589] oom_reaper: reaped process 604639 (python), now anon-rss:0kB, file-rss:68876kB, shmem-rss:16228kB
So how to solve this problem?
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 19809/19809, 8.4 task/s, elapsed: 2364s, ETA: 0s Evaluating bbox... Killed
I retested and found that every time I came to the evaluating bbox link, the memory was consumed madly, and 30g of memory was not enough for the test? Is the model too large, or is the LVIS 1.0 dataset too large?
Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.
Just use torch that meets mmdet 2.3.
We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?
Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.
Just use torch that meets mmdet 2.3.
My current pytorch = 1.8.1. Do I need to lower the version if I want to install mmdet2.3.0? Or as long as mmcv full meets the requirements. I'm worried that CUDA version is too high and incompatible.
Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.
Just use torch that meets mmdet 2.3.
My current pytorch = 1.8.1. Do I need to lower the version if I want to install mmdet2.3.0? Or as long as mmcv full meets the requirements. I'm worried that CUDA version is too high and incompatible.
As shown in get_started.md, mmdet 2.3
needs mmcv-full==1.0.5
. In my memory, this version of MMCV supports PyTorch 1.8.1. So you do not need to get a lower version of Pytorch.
Hello, I want to ask a question about environment construction. Recently, I was reproducing a paper. The code in the paper uses mmdet = 2.3.0, but my GPU is rtx3090 and CUDA = 11.1. Can I build an environment of mmdet 2.3.0? So I don't have to rebuild the code.
Just use torch that meets mmdet 2.3.
My current pytorch = 1.8.1. Do I need to lower the version if I want to install mmdet2.3.0? Or as long as mmcv full meets the requirements. I'm worried that CUDA version is too high and incompatible.
As shown in get_started.md,
mmdet 2.3
needsmmcv-full==1.0.5
. In my memory, this version of MMCV supports PyTorch 1.8.1. So you do not need to get a lower version of Pytorch.
thank you!
We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?
He!I used the latest version of mmdet = 2.15 to test the lvisv1.0 dataset, and used the officially downloaded lvis1.0(mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth) model and default configuration for evaluation and testing, but there will still be insufficient memory leading to the killing of the process. So I want to know how much memory is needed to test LVIs 1.0 model, and why I still don't have enough memory in 30g.
We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?
He!I used the latest version of mmdet = 2.15 to test the lvisv1.0 dataset, and used the officially downloaded lvis1.0(mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth) model and default configuration for evaluation and testing, but there will still be insufficient memory leading to the killing of the process. So I want to know how much memory is needed to test LVIs 1.0 model, and why I still don't have enough memory in 30g.
I tried to run evalutation on lvis_v1 and found that sometimes the RES( resident memory usage) is sometimes more than 0.036t, so 30g memory is obviously not enough.
Thanks !!!
We have released configs about lvis dataset in the latest version of mmdet, can you run testing successfully with our configs?
He!I used the latest version of mmdet = 2.15 to test the lvisv1.0 dataset, and used the officially downloaded lvis1.0(mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth) model and default configuration for evaluation and testing, but there will still be insufficient memory leading to the killing of the process. So I want to know how much memory is needed to test LVIs 1.0 model, and why I still don't have enough memory in 30g.
I tried to run evalutation on lvis_v1 and found that sometimes the RES( resident memory usage) is sometimes more than 0.036t, so 30g memory is obviously not enough.
Hi, can you shine some light on why it needs so much RES during the evaluation phase? Is it keeping all the detection results before they get compared to GTs? Is there a way to reduce RES usage?
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 19809/19809, 8.4 task/s, elapsed: 2364s, ETA: 0s2021-08-18 08:48:11,286 - mmdet - INFO - Evaluating bbox... [08/18 08:48:11] mmdet INFO: Evaluating bbox... Killed
env: CUDA=11.1 pytorch=1.8.1 mmdet=2.12 mmcv-full=1.3.8
我用scnet训练LVIS1.0数据集,训练成功后进行测试--eval bbox segm,结果进度完成后没有显示结果 data_root = 'data/lvis_v1' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict( type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='SegRescale', scale_factor=1 / 8), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ]
data_root = 'data/lvis_v1' # change this to your own path data = dict( samples_per_gpu=1, workers_per_gpu=1, train=dict( type='ClassBalancedDataset', oversample_thr=0.001, seg_prefix=data_root + 'stuffthingmaps/train2017/', dataset=dict( type='LVISV1Dataset', ann_file='annotations/lvis_v1_train.json', img_prefix='', pipeline=[ dict(type='LoadImageFromFile'), dict( type='LoadAnnotations', with_bbox=True, with_mask=True, poly2mask=False), dict( type='Resize', img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], multiscale_mode='value', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ], data_root=data_root) ), val=dict( type='LVISV1Dataset', ann_file='annotations/lvis_v1_val.json', img_prefix='', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], data_root=data_root), test=dict( type='LVISV1Dataset', ann_file='annotations/lvis_v1_val.json', img_prefix='', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], data_root=data_root))
evaluation = dict(metric=['bbox', 'segm'], classwise=True, interval=12)