Closed bogdan-szabo-sociomantic closed 5 years ago
Could you explain how you tracked this down to this method? It'd be helpful to know more details about what happens when you call throttledResume
, and what the call stack is that ends up in this method.
I tracked this for a couple of days.
I inspected the this.map
for the size, and I discovered that all nodes are removed during the loop.
This is the stack:
core.exception.AssertError@./submodules/ocean/src/ocean/util/container/ebtree/c/eb64tree.d(39): THE LEAF_P CAN NOT BE NULL!!!!! <<<<< THE CRASH IS HERE!!!
----------------
src/core/exception.d:438 _d_assert_msg [0x8fdcc1]
./submodules/ocean/src/ocean/util/container/ebtree/c/eb64tree.d:39 ocean.util.container.ebtree.c.eb64tree.eb64_node* ocean.util.container.ebtree.c.eb64tree.eb64_node.next() [0x805218]
./submodules/swarm/src/swarm/neo/util/TreeMap.d:303 int swarm.neo.util.TreeMap.TreeMap!(swarm.neo.client.RequestOnConn.RequestOnConn.TreeMapElement).TreeMap.opApply(scope int delegate(ref swarm.neo.client.RequestOnConn.RequestOnConn)) [0x83f599]
./submodules/swarm/src/swarm/neo/client/RequestOnConnSet.d:199 int swarm.neo.client.RequestOnConnSet.RequestOnConnSet.opApply(scope int delegate(ref swarm.neo.client.RequestOnConn.RequestOnConn)) [0x8101c8]
./submodules/swarm/src/swarm/neo/client/RequestSet.d:475 void swarm.neo.client.RequestSet.RequestSet.Request.resumeSuspendedHandlers(int) [0x8bbf15]
./submodules/swarm/src/swarm/neo/client/mixins/BatchRequestCore.d:180 bool dlsproto.client.request.internal.GetRange.GetRange.__mixin9.Controller.resume() [0x832c4c]
./submodules/swarm/src/swarm/neo/client/mixins/Controllers.d:266 _D8dlsproto6client9DlsClient9DlsClient9__mixin383Neo9__mixin1665__T11SuspendableTC8dlsproto6client7request8GetRange11IControllerZ11Suspendable6resumeMFZ9__lambda1MFC8dlsproto6client7request8GetRange11IControllerZv [0x7dacd2]
./submodules/swarm/src/swarm/neo/client/mixins/ClientCore.d:867 bool dlsproto.client.DlsClient.DlsClient.__mixin38.Neo.__mixin15.controlImpl!(dlsproto.client.request.internal.GetRange.GetRange, dlsproto.client.request.GetRange.IController).controlImpl(ulong, scope void delegate(dlsproto.client.request.GetRange.IController)) [0x7db56a]
./submodules/dlsproto/src/dlsproto/client/mixins/NeoSupport.d:329 bool dlsproto.client.DlsClient.DlsClient.__mixin38.Neo.control!(dlsproto.client.request.GetRange.IController).control(ulong, scope void delegate(dlsproto.client.request.GetRange.IController)) [0x7db4be]
./submodules/swarm/src/swarm/neo/client/mixins/Controllers.d:134 void dlsproto.client.DlsClient.DlsClient.__mixin38.Neo.__mixin16.Controller!(dlsproto.client.request.GetRange.IController).Controller.control(scope void delegate(dlsproto.client.request.GetRange.IController)) [0x7db080]
./submodules/swarm/src/swarm/neo/client/mixins/Controllers.d:263 void dlsproto.client.DlsClient.DlsClient.__mixin38.Neo.__mixin16.Suspendable!(dlsproto.client.request.GetRange.IController).Suspendable.resume() [0x7dacb0]
./submodules/ocean/src/ocean/io/model/ISuspendableThrottler.d:230 void ocean.io.model.ISuspendableThrottler.ISuspendableThrottler.resumeAll() [0x8bfa20]
./submodules/ocean/src/ocean/io/model/ISuspendableThrottler.d:187 void ocean.io.model.ISuspendableThrottler.ISuspendableThrottler.throttledResume() [0x8bf9ae]
./submodules/ocean/src/ocean/task/ThrottledTaskPool.d:212 void ocean.task.ThrottledTaskPool.ThrottledTaskPool!(*****.map.MapTask.MapTask).ThrottledTaskPool.throttlingHook() [0x7dc81d]
./submodules/ocean/src/ocean/task/Task.d:473 bool ocean.task.Task.Task.entryPoint() [0x8d4beb]
./submodules/ocean/src/ocean/task/internal/FiberPoolWithQueue.d:175 _D5ocean4task8internal18FiberPoolWithQueue18FiberPoolWithQueue17workerFiberMethodMFZ7runTaskMFZv [0x8aeb08]
./submodules/ocean/src/ocean/task/internal/FiberPoolWithQueue.d:200 void ocean.task.internal.FiberPoolWithQueue.FiberPoolWithQueue.workerFiberMethod() [0x8aea94]
src/core/thread.d:4277 void core.thread.Fiber.run() [0x9093b1]
src/core/thread.d:3523 fiber_entryPoint [0x909293]
??:? [0xffffffff]
It looks like the map is cleared immediately after a request is done. If this happens during ISuspendableThrottler.throttledResume() which iterates through the map, the app will crash.
I think it's not possible to finish a request at the same time as resuming the request fibers? Keep in mind, we are using corporative multi-threading. It seems to me that this bug needs further investigation.
I think it's not possible to finish a request at the same time as resuming the request fibers? Keep in mind, we are using corporative multi-threading. It seems to me that this bug needs further investigation.
Resuming an suspendable neo allnode request, using the corresponding ISuspendable
instance iterates over a set of RoCs
calling resumeFiber
for each RoC
. Thus I've reconsidered my former assumption and think it might be possible, maybe depending on the individual request implementation, that a request finishes during the iteration.
@matthias-wende-sociomantic I updated the pr with your proposal. I will not add the assert here, since it looks more like a hack...
Given an AllNodeRequests
. Iterating over the RoCs map (src/swarm/neo/client/RequestOnConnSet.d
), using the opApply
call, operates on each RoC individually using a caller provided delegate. Therefore it might happen that the RoC fiber is resumed – i.e. it will be immediately jumped to the fiber method, before the iteration is finished.
Whenever a RoC is finished the handlerFinished
(src/swarm/neo/client/RequestSet.d
) method is called and if this is the last non finished RoC, then also RequestOnConnSet.reset()
is invoked which removes each RoC from the RoCs map.
It can now happen that during an iteration all but one RoCs are finished – i.e. RequestOnConnSet.finished()
is called, and that the operation on the last remaining RoC causes it to finish as well.
Returning to the iteration shouldn't cause an error since the used TreeMap is expected to deal with changing it's element during an iteration.
Somehow it seemes that the last assumption doesn't hold true and happens that the iteration SegFaults.
To avoid this segfault this fix stops the iteration when all RoCs are finished.
It looks like the map is cleared immediately after a request is done. If this happens during
ISuspendableThrottler.throttledResume()
which iterates through the map, the app will crash.