ros-drivers / video_stream_opencv

A package to open video streams and publish them in ROS using the opencv videocapture mechanism
221 stars 159 forks source link

Memory leak when subscribing to /videofile/image_raw topic #15

Open aviogit opened 6 years ago

aviogit commented 6 years ago

Houston, we have another problem.

There is a memory leak when subscribing to /videofile/image_raw topic. Valgrind sees the memory as always reachable (so technically speaking it's not a true leak), but the process ends up using several gigs of memory. When one stops image_view, memory is released correctly.

If you want to take a look at it, you can clone this branch:

https://github.com/aviogit/video_stream_opencv/tree/memory_leak_when_subscribing_to_slash_videofile_slash_image_raw

I found an excellent piece of code by @BloodAxe here: https://computer-vision-talks.com/hacking-opencv-for-fun-and-profit/

and now I can even see precise numbers. I'll try to understand why it happens, cv_bridge::CvImage::toImageMsg() does a deep copy, so I actually see no reason for this to happen.

http://docs.ros.org/kinetic/api/cv_bridge/html/c++/cv__bridge_8cpp_source.html

I'll try to look into this further...

aviogit commented 6 years ago

I think that the problem is somewhat simpler than what I believed originally. In the launchfile I left the buffer_queue_size parameter at it's default value of 1000 and I'm using a full HD video that produces 1920x1080 RGB images on a Odroid XU4 with 2Gb of RAM. This means 6 megs per image @ 25fps = 140 megs per second and an OOM kill after maybe 15 seconds on average vs. 40 seconds capacity of framesQueue.

There is just one point which leaves me somewhat perplexed: why all this happens only if I subscribe to the _/videofile/imageraw topic? I'm pretty sure this behavior is triggered by ROS that skips useless stuff if nobody is listening on the topic. But I'm still not understanding it fully, so I don't know how to solve this.

awesomebytes commented 6 years ago

As an initial guess, lower the buffer_queue_size of course.

Dunno why it happens only when there is a subscriber... if it's a problem of the thread that reads from the videofile, it should happen all the time until the buffer is full. And stop there. With or without subscribers.

The queue_size of the publisher is 1 https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L118 . So is not related to that.

My mind is somewhere else right now, but, may this line have anything to do?

https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L104

It still makes no sense that it happens only when there is a subscriber tho.

aviogit commented 6 years ago

It's so weird... If I run it without any subscriber, I see this:

framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367

Oscillating between 12442567/18663367 and 18663367/18663367 for, say, ten seconds, then it stays at 12442567/18663367.

If I run it with an image_view as subscriber, I see this:

framesQueue.push(frame.clone()) 99533767/105754567 framesQueue.push(frame.clone()) 99533767/105754567 framesQueue.push(frame.clone()) 99533767/105754567 framesQueue.push(frame.clone()) 105754567/111975367 framesQueue.push(frame.clone()) 105754567/111975367 framesQueue.push(frame.clone()) 105754567/111975367 framesQueue.push(frame.clone()) 111975367/111975367 framesQueue.push(frame.clone()) 111975367/111975367

and at the end this:

framesQueue.push(frame.clone()) 1735604167/1741824967 framesQueue.push(frame.clone()) 1735604167/1741824967 framesQueue.push(frame.clone()) 1741824967/1741824967 framesQueue.push(frame.clone()) 1741824967/1748045767 framesQueue.push(frame.clone()) 1741824967/1748045767 framesQueue.push(frame.clone()) 1748045767/1754266567 framesQueue.push(frame.clone()) 1748045767/1754266567 framesQueue.push(frame.clone()) 1716941767/1716941767 framesQueue.push(frame.clone()) 1710720967/1710720967 framesQueue.push(frame.clone()) 1710720967/1710720967 framesQueue.push(frame.clone()) 1710720967/1710720967 framesQueue.push(frame.clone()) 1710720967/1716941767

(This time the OOM killer has been lenient with it.)

So no frame drops at all because of queue full (yes, I forgot to instrument the other framesQueue.pull(), the one that normally pops out frames to convert them into ROS messages).

I think we can say that producer and consumer here works perfectly well together, but the queue is filled anyway up to 1000.

Don't worry about this too much, for the moment I can solve it setting queue size to 10 and maybe adding a ROS_WARN if someone sets the queue so that the memory usage can reach, say, 512 megabytes? Just a curiosity of mine :)

aviogit commented 6 years ago

This is what happens adding a few more couts and one MEASURE_MEMORY() around framesQueue.pull(frame):

Queue before framesQueue.pull(frame): 278 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 277 framesQueue.push(frame.clone()) 1741824967/1741824967 Queue before framesQueue.pull(frame): 278 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 277 framesQueue.push(frame.clone()) 1741824967/1741824967 Queue before framesQueue.pull(frame): 278 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 277 framesQueue.push(frame.clone()) 1741824967/1741824967 framesQueue.push(frame.clone()) 1741824967/1748045767 Queue before framesQueue.pull(frame): 279 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 278 framesQueue.push(frame.clone()) 1741824967/1748045767 framesQueue.push(frame.clone()) 1748045767/1754266567 [videofile/videofile_stream-1] process has died [pid 13121, exit code -9, cmd /mnt/data/workspace/ros/devel/lib/video_stream_opencv/video_stream camera:=image_raw __name:=videofile_stream __log:=/home/pelan/.ros/log/25c39d52-3cc9-11e8-845e-001e0631df58/videofile-videofile_stream-1.log]. log file: /home/pelan/.ros/log/25c39d52-3cc9-11e8-845e-001e0631df58/videofile-videofile_stream-1*.log all processes on machine have died, roslaunch will exit shutting down processing monitor... ... shutting down processing monitor complete done

I'm starting to believe that boost::sync_queue::pull() is not behaving as expected...

awesomebytes commented 6 years ago

Can you commit the version with more couts?

aviogit commented 6 years ago

Done. I think that the producer here is just a little bit faster than the consumer. If I put a double camera_fps_rate.sleep() in do_capture(), the queue size reaches 1 at maximum.

awesomebytes commented 6 years ago

I get:

framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
framesQueue.pull(frame) 68/68
framesQueue.pull(frame) 68/68
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68

With a 3200 x 1800 30fps video.

I get the same with my webcam at 30 or 60fps.

On a laptop, 4 cores Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz

aviogit commented 6 years ago

Eheh because you're using a monster-PC. Try it on a Raspberry Pi :)

aviogit commented 6 years ago

Mmm no, there's something wrong. 68 bytes always makes absolutely no sense...

aviogit commented 6 years ago

Are you compiling with -O2 or -O3?

awesomebytes commented 6 years ago

I've pulled your last version, tried with the video again (my laptop is dying btw) and now I get:

Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1

framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68

Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0

framesQueue.push(frame.clone()) 68/68

Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
awesomebytes commented 6 years ago

I pulled your branch, I'm using your CMakeLists.txt: set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g3 -gdwarf-2 -O0 -std=c++11")

awesomebytes commented 6 years ago

Also, I get that same 68 with our without subscribers.

aviogit commented 6 years ago

Mmm very strange, let me try it on my desktop.

aviogit commented 6 years ago

Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623 Queue before framesQueue.pull(frame): 1 framesQueue.pull(frame) 24884423/18663623 Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623 Queue before framesQueue.pull(frame): 1 framesQueue.pull(frame) 24884423/18663623 Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623 Queue before framesQueue.pull(frame): 1 framesQueue.pull(frame) 24884423/18663623 Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623

This is what I get on my desktop. Values are stable but != 68 :)

awesomebytes commented 6 years ago

I'm on a Ubuntu 14.04, 64bits, ROS indigo. Opencv 2.4.8

awesomebytes commented 6 years ago

Just tried on a Ubuntu 16.04, 64b, ROS Kinetic, on a VM. Opencv3 from kinetic. And the numbers are more similar to yours... 11029703/8272583 And so (cant copy paste).

aviogit commented 6 years ago

Mmm, OpenCV 2.4.8 seems to have fastMalloc() as well: https://github.com/opencv/opencv/blob/2.4/modules/core/src/alloc.cpp

So I don't know where is the problem. I'm on Ubuntu 16.04, 64bits, Kinetic with OpenCV 3.3.1 (I checked with cv::getBuildInformation() because Ubuntu still installs 2.4.9.1 while ROS pulls in its own package).

aviogit commented 6 years ago

Ok, maybe I'm starting to understand:

[ INFO] [1523450336.860524732]: Setting camera FPS to: 30 [ INFO] [1523450336.860832980]: Camera reports FPS: 25 [ INFO] [1523450336.864597742]: Setting buffer size for capturing frames to: 1000 [ INFO] [1523450336.868807250]: Throttling to fps: 30

this is the problem IMHO. My video is 25 fps but in the launchfile I ask for 30 fps, so maybe frames are produced at 30 fps and consumed at 25?

awesomebytes commented 6 years ago

You are trying to consume at 30fps (Throttling fps). You are producing at 30fps (setting camera fps sets set_camera_fps which is the only used for the sleep in the producer thread).

The reporting value is just a print, is not used.

Axel13fr commented 6 years ago

Hi, I could reproduce the growing memory problem as well (Kinetic, Ubu 16.04 64 Bits). When I set fps parameter < Camera Reports FPS, I get the issue. In my example, I had fps = 10 and my camera reported 25fps. Setting fps to 25 solved the issue. I have a proper core i7 with 8GB RAM and even a buffer queue size of 5 didn't stop the leak.

This issue still is strange to me as it appears to be a recent problem: I have used this node for several months with fps = 10 and a 25 fps camera without any issues until recently.

Axel13fr commented 6 years ago

So using the same parameters (fps=10, set_camera_fps=25) on an older version (before implementing a thread using a synch queue), I did not get this memory growing problem.

(Older version used: https://github.com/ros-drivers/video_stream_opencv/blob/89b0224b18b6e67860d4ad5c36c993fe7534e199/src/video_stream.cpp).

Axel13fr commented 6 years ago

So here is my conclusion: the problem comes from Boost. Proof: check my merge request where I basically reimplemented a thread safe queue with mutex using plain std classes.

The code does exacly the same thing, except that locks are done once per consum/produce cycle. It fixes as well a possible edge case where the user sets a queue size of 1 and 2 calls of pull can happen on after the other from the 2 different threads, leading the second pull to return either the previous picture (consumer pulls last) or an empty image (producer pulls last).

I'm a bit surprised that this was implemented with experimental boost features...

awesomebytes commented 6 years ago

Thank you very much for your work!!

I'm currently busy (in the Robocup contest) but I'll take a look, try it and act accordingly as soon as I can.

Thank you again!

On Thu, Jun 21, 2018, 12:07 Axel13fr notifications@github.com wrote:

So here is my conclusion: the problem comes from Boost. Proof: check my merge request where I basically reimplemented a thread safe queue with mutex using plain std classes.

The code does exacly the same thing, except that locks are done once per consum/produce cycle. It fixes as well a possible edge case where the user sets a queue size of 1 and 2 calls of pull can happen on after the other from the 2 different threads, leading the second pull to return either the previous picture (consumer pulls last) or an empty image (producer pulls last).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ros-drivers/video_stream_opencv/issues/15#issuecomment-399157748, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpFdKhrxvovYG8-BA2KJe0OT591SfZ5ks5t-8S1gaJpZM4TPu_B .

Axel13fr commented 6 years ago

@awesomebytes looking forward to a melodic release :-)

awesomebytes commented 6 years ago

@Axel13fr re-release on all distros coming soon (it's PR-ed).

Axel13fr commented 6 years ago

Thanks a lot !

lchop commented 4 years ago

Hello ! Thanks for your work ! I have a memory leak using your package streaming an rtsp stream. I have played the stream all night and in the morning the node had crash because of lack of memory. I think it is related to this issue: https://github.com/ros-drivers/video_stream_opencv/issues/15 This issue is not solved ? I am running on ROS kinetic. I have a camera streaming at 10 fps and I have set the fps param of your node at 30. I think that is the reason leading to the memory leak. Any updates ? Thank you !