jsk-ros-pkg / jsk_recognition

JSK perception ROS packages
https://github.com/jsk-ros-pkg/jsk_recognition
269 stars 190 forks source link

OutOfMemoryError on my P51s #2613

Open k-okada opened 3 years ago

k-okada commented 3 years ago

I tried to run FCN example on my P51s

roslaunch jsk_perception sample_fcn_object_segmentation.launch gpu:=0

and have following error. This error can be avoid by adding following code and I'd like to add this code on source tree instead of adding the code every time.

--- a/jsk_perception/node_scripts/fcn_object_segmentation.py
+++ b/jsk_perception/node_scripts/fcn_object_segmentation.py
@@ -27,6 +27,9 @@ import chainer
 from chainer import cuda
 import chainer.serializers as S
 import fcn
+import cupy as cp

 import cv_bridge

Is there any why to know if the model fits in current memory before we send to hardware? Or

try:
    self.model(x)
except:
    pool = cp.cuda.MemoryPool(cp.cuda.malloc_managed)
    cp.cuda.set_allocator(pool.malloc)
    self.model(x)

is enough?

[ERROR] [1625980682.759359]: bad callback: <bound method FCNObjectSegmentation._cb of <__main__.FCNObjectSegmentation object at 0x7fe4283c0d90>>
Traceback (most recent call last):
  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/topics.py", line 750, in _invoke_callback
    cb(msg)
  File "/home/k-okada/catkin_ws/ws_recognition/src/jsk_recognition/jsk_perception/node_scripts/fcn_object_segmentation.py", line 177, in _cb
    label, proba_img = self.segment(img)
  File "/home/k-okada/catkin_ws/ws_recognition/src/jsk_recognition/jsk_perception/node_scripts/fcn_object_segmentation.py", line 187, in segment
    return self._segment_chainer_backend(bgr)
  File "/home/k-okada/catkin_ws/ws_recognition/src/jsk_recognition/jsk_perception/node_scripts/fcn_object_segmentation.py", line 204, in _segment_chainer_backend
    self.model(x)
  File "/usr/local/lib/python2.7/dist-packages/fcn/models/fcn8s.py", line 71, in __call__
    h = F.relu(self.conv2_1(pool1))
  File "/usr/local/lib/python2.7/dist-packages/chainer/functions/activation/relu.py", line 168, in relu
    y, = ReLU().apply((x,))
  File "/usr/local/lib/python2.7/dist-packages/chainer/function_node.py", line 321, in apply
    outputs = self.forward(in_data)
  File "/usr/local/lib/python2.7/dist-packages/chainer/function_node.py", line 512, in forward
    return self.forward_gpu(inputs)
  File "/usr/local/lib/python2.7/dist-packages/chainer/functions/activation/relu.py", line 60, in forward_gpu
    y = cuda.cupy.maximum(x, 0, dtype=x.dtype)
  File "cupy/core/_kernel.pyx", line 849, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 339, in cupy.core._kernel._get_out_args
  File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 528, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1095, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1116, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 944, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 959, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 707, in cupy.cuda.memory._try_malloc
OutOfMemoryError: out of memory to allocate 66932736 bytes (total 1547073536 bytes)
knorth55 commented 3 years ago

@k-okada Yes, but your P51s's NVIDIA Quadro M520 is maxwell architecture, which does not support unified memory. we need Pascal architecture Nvidia GPU at least. https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/#:~:text=New%20Pascal%20Unified%20Memory%20Features&text=Unified%20Memory%20was%20introduced%20in,6%20and%20the%20Kepler%20architecture.&text=The%20Page%20Migration%20engine%20allows,on%2Ddemand%20for%20efficient%20processing.

knorth55 commented 3 years ago

@k-okada It was my misunderstandings, but unified memory is introduced in Keplar architecture. However, cupy's unified memory is enabled from Pascal architecture. https://developer.nvidia.com/blog/unified-memory-cuda-beginners/ https://github.com/cupy/cupy/pull/447#issue-137859218

k-okada commented 3 years ago

if we add

pool = cp.cuda.MemoryPool(cp.cuda.malloc_managed) cp.cuda.set_allocator(pool.malloc)

then, it works on my p51s, so question is; Should we put this code on fcn_object_segmentation.py ?

-- ◉ Kei Okada

2021年7月12日(月) 0:01 Shingo Kitagawa @.***>:

@k-okada https://github.com/k-okada It was my misunderstandings, but unified memory is introduced in Keplar architecture. However, cupy's unified memory is enabled from Pascal architecture. https://developer.nvidia.com/blog/unified-memory-cuda-beginners/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jsk-ros-pkg/jsk_recognition/issues/2613#issuecomment-877813643, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYNXBXAJG6PFBRD4ASUW3TXGW23ANCNFSM5AE6LHHA .

knorth55 commented 3 years ago

in my P50, the code does not work, so it doesnot change my situation. but yes. it does not have bad effect.