Run main.py for testing, but got out of memory

lfz / DSB2017

The solution of team 'grt123' in DSB2017

MIT License

1.24k stars 418 forks source link

Run main.py for testing, but got out of memory #8

Open stanislashzc opened 7 years ago

stanislashzc commented 7 years ago

Hi,

I got this error message when I run python main.py for testing:

THCudaCheck FAIL file=/home/xxx/pytorch-0.1.12/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory Traceback (most recent call last): File "main.py", line 58, in test_detect(test_loader, nod_net, get_pbb, bbox_result_path,config1,n_gpu=config_submit['n_gpu'])

my computer have 2 GTX 1080p GPU with memory 8G, which is less than Titan X with 12 G memory, is that the problem?

Any help would be appreciate.

stanislashzc commented 7 years ago

here is my submit_config.py

config = {'datapath':'./test111/', 'preprocess_result_path':'./prep_result/', 'outputfile':'prediction.csv',

      'detector_model':'net_detector',
     'detector_param':'./model/detector.ckpt',
     'classifier_model':'net_classifier',
     'classifier_param':'./model/classifier.ckpt',
     'n_gpu':0,
     'n_worker_preprocessing':0,
     'use_exsiting_preprocessing':False,
     'skip_preprocessing':True,
     'skip_detect':False}

lfz commented 7 years ago

n_gpu=2 in your case n_worker_preprocessing = number of your cpu threads

NHZlX commented 6 years ago

Hi, I meet the same problem. My computer have a Tesla graphic with 12G memory, but it still out of memory.

eagle-star commented 6 years ago

Ask what testing phase is running so slow, I used 4 GPU, has been running for 2 days without any progress

eagle-star commented 6 years ago

Epoch 100 (lr 0.00010) Train: tpr 98.48, tnr 98.64, total pos 1317, total neg 3762, time 225.03 loss 0.0833, classify loss 0.0605, regress loss 0.0046, 0.0038, 0.0041, 0.0103

Validation: tpr 97.69, tnr 99.99650389, total pos 216, total neg 19850634, time 16.43 loss 0.0655, classify loss 0.0460, regress loss 0.0030, 0.0022, 0.0025, 0.0118

using gpu 0,1,2,3 results/res18/bbox (18L, 1L, 208L, 208L, 208L) [0, '059d8c14b2256a2ba4e38ac511700203'] (8L, 1L, 208L, 208L, 208L) [1, '66b7666912e1d469cd9817a1cade694c'] (18L, 1L, 208L, 208L, 208L) [2, '185bc9d9fa3a58fea90778215c69d35b'] (8L, 1L, 208L, 208L, 208L) [3, 'fd4c2d4738bc25a5c331dbc101f3323a']

eagle-star commented 6 years ago

I want ask why did not move forward when I run python main.py for test_detect.py

lfz commented 6 years ago

ctrl + c and check where are you stuck at

2018-01-12 11:39 GMT+08:00 liueagle notifications@github.com:

I want ask why did not move forward when I run python main.py for test_detect.py

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lfz/DSB2017/issues/8#issuecomment-357134884, or mute the thread https://github.com/notifications/unsubscribe-auth/AIigQ_DxhU5OxHE3dUPrYyoutj9sFfgnks5tJtPjgaJpZM4NXdKP .

-- 廖方舟清华大学医学院 Liao Fangzhou School of Medicine Tsinghua University Beijing 100084 China

eagle-star commented 6 years ago

I have found the problem. because of lack of memory. thanks

huangmozhilv commented 6 years ago

Hi @liueagle , how did you fix the lack of memory problem?

My Linux platform has 4 GPU and 12 CPU threads.

In "DSB2017/config_submit.py", the "n_gpu" has been reduced to 4. In "DSB2017/main.py" line 55, test_loader = DataLoader(dataset,batch_size = 1, shuffle = False,num_workers = 32,pin_memory=False,collate_fn =collate). I set batch_size = 8, num_workers=8, but it still prompts "out of memory".

zp678 commented 5 years ago

Run main.py for testing, but got out of memory。Set batch_size to 1，the result also failed。Error log: @lfz

KenWang94 commented 5 years ago

I have found the problem. because of lack of memory. thanks Could you tell me how to solve this problem?