Open aviogit opened 6 years ago
I think that the problem is somewhat simpler than what I believed originally. In the launchfile I left the buffer_queue_size parameter at it's default value of 1000 and I'm using a full HD video that produces 1920x1080 RGB images on a Odroid XU4 with 2Gb of RAM. This means 6 megs per image @ 25fps = 140 megs per second and an OOM kill after maybe 15 seconds on average vs. 40 seconds capacity of framesQueue.
There is just one point which leaves me somewhat perplexed: why all this happens only if I subscribe to the _/videofile/imageraw topic? I'm pretty sure this behavior is triggered by ROS that skips useless stuff if nobody is listening on the topic. But I'm still not understanding it fully, so I don't know how to solve this.
As an initial guess, lower the buffer_queue_size of course.
Dunno why it happens only when there is a subscriber... if it's a problem of the thread that reads from the videofile, it should happen all the time until the buffer is full. And stop there. With or without subscribers.
The queue_size of the publisher is 1 https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L118 . So is not related to that.
My mind is somewhere else right now, but, may this line have anything to do?
https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L104
It still makes no sense that it happens only when there is a subscriber tho.
It's so weird... If I run it without any subscriber, I see this:
framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 18663367/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367 framesQueue.push(frame.clone()) 12442567/18663367
Oscillating between 12442567/18663367 and 18663367/18663367 for, say, ten seconds, then it stays at 12442567/18663367.
If I run it with an image_view as subscriber, I see this:
framesQueue.push(frame.clone()) 99533767/105754567 framesQueue.push(frame.clone()) 99533767/105754567 framesQueue.push(frame.clone()) 99533767/105754567 framesQueue.push(frame.clone()) 105754567/111975367 framesQueue.push(frame.clone()) 105754567/111975367 framesQueue.push(frame.clone()) 105754567/111975367 framesQueue.push(frame.clone()) 111975367/111975367 framesQueue.push(frame.clone()) 111975367/111975367
and at the end this:
framesQueue.push(frame.clone()) 1735604167/1741824967 framesQueue.push(frame.clone()) 1735604167/1741824967 framesQueue.push(frame.clone()) 1741824967/1741824967 framesQueue.push(frame.clone()) 1741824967/1748045767 framesQueue.push(frame.clone()) 1741824967/1748045767 framesQueue.push(frame.clone()) 1748045767/1754266567 framesQueue.push(frame.clone()) 1748045767/1754266567 framesQueue.push(frame.clone()) 1716941767/1716941767 framesQueue.push(frame.clone()) 1710720967/1710720967 framesQueue.push(frame.clone()) 1710720967/1710720967 framesQueue.push(frame.clone()) 1710720967/1710720967 framesQueue.push(frame.clone()) 1710720967/1716941767
(This time the OOM killer has been lenient with it.)
So no frame drops at all because of queue full (yes, I forgot to instrument the other framesQueue.pull(), the one that normally pops out frames to convert them into ROS messages).
I think we can say that producer and consumer here works perfectly well together, but the queue is filled anyway up to 1000.
Don't worry about this too much, for the moment I can solve it setting queue size to 10 and maybe adding a ROS_WARN if someone sets the queue so that the memory usage can reach, say, 512 megabytes? Just a curiosity of mine :)
This is what happens adding a few more couts and one MEASURE_MEMORY() around framesQueue.pull(frame):
Queue before framesQueue.pull(frame): 278 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 277 framesQueue.push(frame.clone()) 1741824967/1741824967 Queue before framesQueue.pull(frame): 278 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 277 framesQueue.push(frame.clone()) 1741824967/1741824967 Queue before framesQueue.pull(frame): 278 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 277 framesQueue.push(frame.clone()) 1741824967/1741824967 framesQueue.push(frame.clone()) 1741824967/1748045767 Queue before framesQueue.pull(frame): 279 framesQueue.pull(frame) 1748045767/1741824967 Queue after framesQueue.pull(frame): 278 framesQueue.push(frame.clone()) 1741824967/1748045767 framesQueue.push(frame.clone()) 1748045767/1754266567 [videofile/videofile_stream-1] process has died [pid 13121, exit code -9, cmd /mnt/data/workspace/ros/devel/lib/video_stream_opencv/video_stream camera:=image_raw __name:=videofile_stream __log:=/home/pelan/.ros/log/25c39d52-3cc9-11e8-845e-001e0631df58/videofile-videofile_stream-1.log]. log file: /home/pelan/.ros/log/25c39d52-3cc9-11e8-845e-001e0631df58/videofile-videofile_stream-1*.log all processes on machine have died, roslaunch will exit shutting down processing monitor... ... shutting down processing monitor complete done
I'm starting to believe that boost::sync_queue::pull() is not behaving as expected...
Can you commit the version with more couts?
Done. I think that the producer here is just a little bit faster than the consumer. If I put a double camera_fps_rate.sleep() in do_capture(), the queue size reaches 1 at maximum.
I get:
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
framesQueue.pull(frame) 68/68
framesQueue.pull(frame) 68/68
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
With a 3200 x 1800 30fps video.
I get the same with my webcam at 30 or 60fps.
On a laptop, 4 cores Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
Eheh because you're using a monster-PC. Try it on a Raspberry Pi :)
Mmm no, there's something wrong. 68 bytes always makes absolutely no sense...
Are you compiling with -O2 or -O3?
I've pulled your last version, tried with the video again (my laptop is dying btw) and now I get:
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame) 68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone()) 68/68
Queue before framesQueue.pull(frame): 3
I pulled your branch, I'm using your CMakeLists.txt: set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g3 -gdwarf-2 -O0 -std=c++11")
Also, I get that same 68 with our without subscribers.
Mmm very strange, let me try it on my desktop.
Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623 Queue before framesQueue.pull(frame): 1 framesQueue.pull(frame) 24884423/18663623 Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623 Queue before framesQueue.pull(frame): 1 framesQueue.pull(frame) 24884423/18663623 Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623 Queue before framesQueue.pull(frame): 1 framesQueue.pull(frame) 24884423/18663623 Queue after framesQueue.pull(frame): 0 framesQueue.push(frame.clone()) 18663623/18663623
This is what I get on my desktop. Values are stable but != 68 :)
I'm on a Ubuntu 14.04, 64bits, ROS indigo. Opencv 2.4.8
Just tried on a Ubuntu 16.04, 64b, ROS Kinetic, on a VM. Opencv3 from kinetic. And the numbers are more similar to yours... 11029703/8272583 And so (cant copy paste).
Mmm, OpenCV 2.4.8 seems to have fastMalloc() as well: https://github.com/opencv/opencv/blob/2.4/modules/core/src/alloc.cpp
So I don't know where is the problem. I'm on Ubuntu 16.04, 64bits, Kinetic with OpenCV 3.3.1 (I checked with cv::getBuildInformation() because Ubuntu still installs 2.4.9.1 while ROS pulls in its own package).
Ok, maybe I'm starting to understand:
[ INFO] [1523450336.860524732]: Setting camera FPS to: 30 [ INFO] [1523450336.860832980]: Camera reports FPS: 25 [ INFO] [1523450336.864597742]: Setting buffer size for capturing frames to: 1000 [ INFO] [1523450336.868807250]: Throttling to fps: 30
this is the problem IMHO. My video is 25 fps but in the launchfile I ask for 30 fps, so maybe frames are produced at 30 fps and consumed at 25?
You are trying to consume at 30fps (Throttling fps). You are producing at 30fps (setting camera fps sets set_camera_fps which is the only used for the sleep in the producer thread).
The reporting value is just a print, is not used.
Hi, I could reproduce the growing memory problem as well (Kinetic, Ubu 16.04 64 Bits). When I set fps parameter < Camera Reports FPS, I get the issue. In my example, I had fps = 10 and my camera reported 25fps. Setting fps to 25 solved the issue. I have a proper core i7 with 8GB RAM and even a buffer queue size of 5 didn't stop the leak.
This issue still is strange to me as it appears to be a recent problem: I have used this node for several months with fps = 10 and a 25 fps camera without any issues until recently.
So using the same parameters (fps=10, set_camera_fps=25) on an older version (before implementing a thread using a synch queue), I did not get this memory growing problem.
(Older version used: https://github.com/ros-drivers/video_stream_opencv/blob/89b0224b18b6e67860d4ad5c36c993fe7534e199/src/video_stream.cpp).
So here is my conclusion: the problem comes from Boost. Proof: check my merge request where I basically reimplemented a thread safe queue with mutex using plain std classes.
The code does exacly the same thing, except that locks are done once per consum/produce cycle. It fixes as well a possible edge case where the user sets a queue size of 1 and 2 calls of pull can happen on after the other from the 2 different threads, leading the second pull to return either the previous picture (consumer pulls last) or an empty image (producer pulls last).
I'm a bit surprised that this was implemented with experimental boost features...
Thank you very much for your work!!
I'm currently busy (in the Robocup contest) but I'll take a look, try it and act accordingly as soon as I can.
Thank you again!
On Thu, Jun 21, 2018, 12:07 Axel13fr notifications@github.com wrote:
So here is my conclusion: the problem comes from Boost. Proof: check my merge request where I basically reimplemented a thread safe queue with mutex using plain std classes.
The code does exacly the same thing, except that locks are done once per consum/produce cycle. It fixes as well a possible edge case where the user sets a queue size of 1 and 2 calls of pull can happen on after the other from the 2 different threads, leading the second pull to return either the previous picture (consumer pulls last) or an empty image (producer pulls last).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ros-drivers/video_stream_opencv/issues/15#issuecomment-399157748, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpFdKhrxvovYG8-BA2KJe0OT591SfZ5ks5t-8S1gaJpZM4TPu_B .
@awesomebytes looking forward to a melodic release :-)
@Axel13fr re-release on all distros coming soon (it's PR-ed).
Thanks a lot !
Hello ! Thanks for your work ! I have a memory leak using your package streaming an rtsp stream. I have played the stream all night and in the morning the node had crash because of lack of memory. I think it is related to this issue: https://github.com/ros-drivers/video_stream_opencv/issues/15 This issue is not solved ? I am running on ROS kinetic. I have a camera streaming at 10 fps and I have set the fps param of your node at 30. I think that is the reason leading to the memory leak. Any updates ? Thank you !
Houston, we have another problem.
There is a memory leak when subscribing to /videofile/image_raw topic. Valgrind sees the memory as always reachable (so technically speaking it's not a true leak), but the process ends up using several gigs of memory. When one stops image_view, memory is released correctly.
If you want to take a look at it, you can clone this branch:
https://github.com/aviogit/video_stream_opencv/tree/memory_leak_when_subscribing_to_slash_videofile_slash_image_raw
I found an excellent piece of code by @BloodAxe here: https://computer-vision-talks.com/hacking-opencv-for-fun-and-profit/
and now I can even see precise numbers. I'll try to understand why it happens, cv_bridge::CvImage::toImageMsg() does a deep copy, so I actually see no reason for this to happen.
http://docs.ros.org/kinetic/api/cv_bridge/html/c++/cv__bridge_8cpp_source.html
I'll try to look into this further...