pothosware / PothosCore

The Pothos data-flow framework
https://github.com/pothosware/PothosCore/wiki
Boost Software License 1.0
303 stars 48 forks source link

network block exhausts available memory #40

Closed xloem closed 8 years ago

xloem commented 9 years ago

I rebooted PothosGui today and my computer froze on the next bootup, eventually responding later with PothosUtil having over 18 GB of address space. The process cmdline was /usr/local/bin/PothosUtil --require-active --proxy-server tcp://[::1]. I'm not aware of having enabled any network behavior in PothosGui.

In /proc//smap I found a 13 GB chunk of memory (7f128b3ef000-7f15cb3f4000). This backtrace of thread 5 shows objects in that range:

#0  Pothos::ManagedBuffer::reset (this=0x7f150e2ec038) at /home/gentoo/src/radio/pothos/library/lib/Framework/ManagedBuffer.cpp:32
#1  0x00007f176dfdc05a in ~BufferChunk (this=0x7f150e2ebfe8, __in_chrg=<optimized out>) at /home/gentoo/src/radio/pothos/library/include/Pothos/Framework/BufferChunk.hpp:26
#2  ~pair (this=0x7f150e2ebfd8, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/include/g++-v4/bits/stl_pair.h:96
#3  _Destroy<std::pair<Pothos::Object, Pothos::BufferChunk> > (__pointer=0x7f150e2ebfd8) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/include/g++-v4/bits/stl_construct.h:93
#4  __destroy<std::pair<Pothos::Object, Pothos::BufferChunk>*> (__last=0x7f15633f2010, __first=0x7f150e2ebfd8) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/include/g++-v4/bits/stl_construct.h:103
#5  _Destroy<std::pair<Pothos::Object, Pothos::BufferChunk>*> (__last=0x7f15633f2010, __first=0x7f14fb3f2010) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/include/g++-v4/bits/stl_construct.h:126
#6  _Destroy<std::pair<Pothos::Object, Pothos::BufferChunk>*, std::pair<Pothos::Object, Pothos::BufferChunk> > (__last=0x7f15633f2010, __first=0x7f14fb3f2010)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/include/g++-v4/bits/stl_construct.h:151
#7  std::vector<std::pair<Pothos::Object, Pothos::BufferChunk>, std::allocator<std::pair<Pothos::Object, Pothos::BufferChunk> > >::operator= (this=this@entry=0x7f16d4017160,
    __x=std::vector of length 33554432, capacity 33554432 = {...}) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/include/g++-v4/bits/vector.tcc:189
#8  0x00007f176dfdc8f4 in Pothos::Util::RingDeque<std::pair<Pothos::Object, Pothos::BufferChunk> >::set_capacity (this=this@entry=0x7f16d4017148, capacity=33554432)
    at /home/gentoo/src/radio/pothos/library/include/Pothos/Util/RingDeque.hpp:251
#9  0x00007f176dfd0f98 in Pothos::InputPort::asyncMessagesPush (this=0x7f16d4017060, message=..., token=...) at /home/gentoo/src/radio/pothos/library/lib/Framework/InputPort.cpp:61
#10 0x00007f176dfdd1f6 in Pothos::OutputPort::_postMessage (this=this@entry=0x7f16d401c570, async=...) at /home/gentoo/src/radio/pothos/library/lib/Framework/OutputPort.cpp:41
#11 0x00007f176dff6bd7 in postMessage<Pothos::Object> (message=<unknown type in /usr/local/lib/libPothos.so.0.3-0, CU 0xac2599, DIE 0xb9f0a7>, this=0x7f16d401c570)
    at /home/gentoo/src/radio/pothos/library/include/Pothos/Framework/OutputPortImpl.hpp:83
#12 Pothos::Block::opaqueCallMethod (this=this@entry=0x7f16d401b280, name="valueTriggered", inputArgs=inputArgs@entry=0x7f1732059920, numArgs=numArgs@entry=1)
    at /home/gentoo/src/radio/pothos/library/lib/Framework/Block.cpp:212
#13 0x00007f176dfe7e09 in Pothos::Connectable::opaqueCall (this=0x7f16d401b280, inputArgs=<optimized out>, numArgs=<optimized out>) at /home/gentoo/src/radio/pothos/library/lib/Framework/Connectable.cpp:48
#14 0x00007f176e014601 in Pothos::CallInterface::callObject<std::string const&, Pothos::Object&> (this=this@entry=0x7f16d401b288, a0="valueTriggered", a1=...)
    at /home/gentoo/src/radio/pothos/library/include/Pothos/Callable/CallInterfaceImpl.hpp:92
#15 0x00007f176dffa96d in callVoid<std::basic_string<char> const&, Pothos::Object&> (a1=..., a0="valueTriggered", this=0x7f16d401b288) at /home/gentoo/src/radio/pothos/library/include/Pothos/Callable/CallInterfaceImpl.hpp:98
#16 Pothos::Block::opaqueCallHandler (this=0x7f16d401b280, name="probeValue", inputArgs=0x0, numArgs=0) at /home/gentoo/src/radio/pothos/library/lib/Framework/Block.cpp:173
#17 0x00007f176e08e68b in Pothos::WorkerActor::handleSlotCalls (this=this@entry=0x7f16d401b570, port=...) at /home/gentoo/src/radio/pothos/library/lib/Framework/WorkerActor.cpp:256
#18 0x00007f176e0931d5 in Pothos::WorkerActor::preWorkTasks (this=this@entry=0x7f16d401b570) at /home/gentoo/src/radio/pothos/library/lib/Framework/WorkerActor.cpp:332
#19 0x00007f176e0935f8 in Pothos::WorkerActor::workTask (this=this@entry=0x7f16d401b570) at /home/gentoo/src/radio/pothos/library/lib/Framework/WorkerActor.cpp:222
#20 0x00007f176e00aca4 in Pothos::WorkerActor::processTask (this=0x7f16d401b570, waitEnabled=<optimized out>) at /home/gentoo/src/radio/pothos/library/lib/Framework/WorkerActor.hpp:40
#21 0x00007f176e0c173c in ThreadEnvironment::singleProcessLoop (this=0x7f16d401b050, handle=0x7f16d401b280) at /home/gentoo/src/radio/pothos/library/lib/Framework/ThreadEnvironment.cpp:218
#22 0x00007f176c1a8893 in execute_native_thread_routine () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.5/libstdc++.so.6
#23 0x00007f176c8154cc in start_thread (arg=0x7f173205a700) at pthread_create.c:310
#24 0x00007f176b9130ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

It seems the OutputPort of a SignalProbe is producing items which are not being consumed? I do have a SignalProbe in my topology which is being triggered very frequently.

Maybe it would be good to have RingDeque<>::set_capacity() throw an error if the capacity is above a threshold. It would have made my computer much more responsive.

I tried to check the input side of the guilty connection and think it is a NetworkSink:

(gdb) p this->_subscribers._M_impl._M_start[0]->_actor->block[0]
$19 = {<Pothos::Connectable> = {<Pothos::CallRegistry> = {_vptr.CallRegistry = 0x7f1768354590 <vtable for NetworkSink+16>}, <Pothos::CallInterface> = {
      _vptr.CallInterface = 0x7f1768354610 <vtable for NetworkSink+144>}, <Pothos::Util::UID> = {_uid = "pothos://livecd/10:fe:ed:12:b7:22/18470/2"}, <Pothos::Util::RefHolder> = {
      _vptr.RefHolder = 0x7f1768354638 <vtable for NetworkSink+184>, _refs = std::vector of length 1, capacity 1 = {{_vptr.Object = 0x7f176e6e0e70 <vtable for Pothos::Object+16>, _impl = 0x7f16d4014770}}}, 
    _name = "NetTo: SignalProbe0[valueTriggered]"}, _workInfo = {inputPointers = std::vector of length 1, capacity 1 = {0x7f16dc004e40}, outputPointers = std::vector of length 0, capacity 0, minElements = 0, 
    minInElements = 0, minOutElements = 1073741824, minAllElements = 0, minAllInElements = 0, minAllOutElements = 1073741824, maxTimeoutNs = 1000000}, _inputPortNames = std::vector of length 1, capacity 1 = {"0"}, 
  _outputPortNames = std::vector of length 0, capacity 0, _indexedInputs = std::vector of length 1, capacity 1 = {0x7f16d4017060}, _indexedOutputs = std::vector of length 0, capacity 0, 
  _namedInputs = std::map with 1 elements = {["0"] = 0x7f16d4017060}, _namedOutputs = std::map with 0 elements, _calls = std::multimap with 1 elements = {["getActualPort"] = {<Pothos::CallInterface> = {
        _vptr.CallInterface = 0x7f176e7008f0 <vtable for Pothos::Callable+16>}, _boundArgs = std::vector of length 1, capacity 1 = {{_vptr.Object = 0x7f176e6e0e70 <vtable for Pothos::Object+16>, _impl = 0x7f16d4010c30}}, 
      _impl = warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<Pothos::Detail::CallableFunctionContainer1<std::string, NetworkSink const&>*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<Pothos::Detail::CallableFunctionContainer1<std::string, NetworkSink const&>*, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 1, weak 0) 0x7f16d4016700}}, _probes = std::map with 0 elements, _threadPool = {_impl = std::shared_ptr (count 1, weak 0) 0x7f16d40169f0}, 
  _actor = std::shared_ptr (count 1, weak 0) 0x7f16d40164d0}

I've made a core dump, so I can investigate further or share it.

guruofquality commented 9 years ago

Thanks for the dump, this will be interesting. Theres supposed to be a backpressure mechanism for buffers and messages. So something like signal probe would no longer get its work function called if one of the downstream consumers was not consuming.

If you are using Pothos GUI, the execute -> show topology stats dump may be interesting. Its going to show total counts for all of the ports, including enqueued elements like messages and buffers. If you are using the API, the topology stats can be dumped to a json string as well with queryJSONStats().

Maybe it would be good to have RingDeque<>::set_capacity()

The RingDeque cant go beyond its capacity (asserts in debug mode). There's actually code in the port handler that checks and resizes this queue. It should probably log when the queue has been resized absurdly out of bounds.

https://github.com/pothosware/pothos/blob/master/library/lib/Framework/InputPort.cpp#L57

xloem commented 9 years ago

Yes, exactly. The function you linked is in the backtrace. It's continually doubling the size of the RingDeque without bounds.

guruofquality commented 9 years ago

I committed a change to help track down and avoid issues like this: https://github.com/pothosware/pothos/commit/6f2dd5d8d113404be1d05021d80ffce6f20078f9 So now we at least know what block isnt consuming. And dont take gigabytes of memory. ;-)

guruofquality commented 9 years ago

@xloem Do you still think there is bug here: Was the signal probe was connected to a consumer that wasn't interested in messages? If that was the case, at least the runtime now logs an error and tosses the data. On the other hand, if the producing but not consuming was unexpected, I would like to figure that one out.

xloem commented 9 years ago

It was not expected, but perhaps it should have been.

I had instructed a signal probe to fire for every single sample and connected it to a text box. I imagine the gui couldn't handle the datarate.

guruofquality commented 9 years ago

Then that means you must have had something triggering the signal probe. If that was the case, we are talking about a lot of signal events being emitted -- as it turns out, these particular ports were not being back-pressured. This commit should fix that: https://github.com/pothosware/pothos/commit/4df18bafc6a026310231714fbd61917b8e149d9e

guruofquality commented 8 years ago

Thanks, closing!