paninski-lab / deepgraphpose

DeepGraphPose
GNU Lesser General Public License v3.0
32 stars 9 forks source link

Insufficient memory problem #14

Open ZiyiZhang0912 opened 3 years ago

ZiyiZhang0912 commented 3 years ago

Hi,

I ran into the same problem with #7 when I tried to run the demo:

Begin Training for 5 iterations Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\dgp\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "C:\ProgramData\Anaconda3\envs\dgp\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\ProgramData\Anaconda3\envs\dgp\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,512,94,104] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/Conv2D}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

It is normal to use DLC on my GPU.

I also tried the method mentioned in #7 , adding a few lines of code after line 158 of run_dgp_demo.py:

image

But the error still did not disappear. I probably understand that the error comes from insufficient memory in CUDA, but I am not sure why this method did not work for me. My understanding of these code is to allow the GPU memory to grow automatically. Did I add it to the correct position?

Hope someone can help me solve this problem, it is very important to me! Thanks for any comments!

obarnstedt commented 3 years ago

Hi, have you also tried decreasing the batch_size? If not, try adding --batch_size 4 in the console (default is 10).

ZiyiZhang0912 commented 3 years ago

Hi, have you also tried decreasing the batch_size? If not, try adding --batch_size 4 in the console (default is 10).

Thank you very much, I can run after I changed the batch size to 4. But even after I added these lines of code, I still can't use other larger batch sizes, and I still get the same error. But I saw that you mentioned in #7 that after you add these lines of code, you can use batch size =10. So I don't know what caused this result.

waq1129 commented 3 years ago

Also, have you tried run DLC? you have this memory issue with DLC?

ZiyiZhang0912 commented 3 years ago

Also, have you tried run DLC? you have this memory issue with DLC?

yes,I used DLC and everything goes smoothly without any memory problems.

waq1129 commented 3 years ago

You mean running DLC with the same data size and the bigger batch size like 10?