Closed mihails-strasuns closed 5 years ago
task scheduler (expect no context switching when handling termination hooks).
I do wonder whether this problem is isolated to the suspendable implementation, or whether that's just the concrete example that has shown this up. Everything in swarm is constantly switching fibers without epoll cycles.
Right now there is a very limited set of termination hooks and suspendable throttling is the only one that can cause trouble afaics. Outside of termination hooks it is generally fine though practice has shown one can easily screw things up when managing fibers this way :)
I actually don't really understand what the connection with termination hooks is. Could you give (or link to) a brief summary of the overall problem?
Background: termination hooks are bunch of delegates that run after main task run
method finishes. They are used to inject various actions that should semantically happen after task finishes - most common examples being delayedResume
on another task inside await
implementation or recycling itself task that comes from a TaskPool
.
There are two important properties of termination hooks that limit how those can be used:
1) There is no well-defined order of how those may run. Adding new hooks and running them happens in LIFO order but any code can add a new hook at any moment - it is a public API. 2) While running hooks worker fiber is in a somewhat weird state - it is neither busy running a task (because it is already considered finished), nor free to take another one. Being able to query scheduler stats and/or task state during this moment will likely lead to confusing/wrong data.
Because of that scheduler documentation puts a hard restriction on termination hooks - those must never cause a context switch, ensuring that it is not possible to observe any ordering issues or inconsistent state from any other code.
Now the key thing here is that suspendable throttler has to call throttling method twice for each task - when it is about to start (to check if suspend is needed) and when it finished (to check if resuming is needed). Latter behavior is also naturally done as a termination hook.
With old swarm it worked just fine because resuming a client meant nothing but registering epoll event causing no other immediate effects. And considering throttler API also expected that (because it expected to be able resume multiple suspendables in a plain foreach
at once) I felt safe to assume it doesn't violate scheduler requirements.
With swarm neo this is violated as immediate processing of request may result in another task spawned from callback. Reason why we noticed it is that after some scheduler refactoring recycling hook ended up being moved to happen before throttling hook - and apps started to hit situation where task is attempted to be reused from task pool again despite its termination hooks are still running. For now quick workaround was to hard-code recyling hook in ocean to be special and always happen last - but it is both dirty and unreliable.
Relevant links:
1) Termination hook logic: https://github.com/sociomantic-tsunami/ocean/blob/v4.x.x/src/ocean/task/Task.d#L465-L482 2) Throttler logic: https://github.com/sociomantic-tsunami/ocean/blob/v4.x.x/src/ocean/io/model/ISuspendableThrottler.d#L227 3) Hack to workaround swarm neo: https://github.com/sociomantic-tsunami/ocean/blob/v4.x.x/src/ocean/task/TaskPool.d#L121-L133
Ping
This will be addressed in #369.
Ocean temporary workaround - https://github.com/sociomantic-tsunami/ocean/pull/658
Currently neo client suspendables (contrary to old ones) may start processing some data immediately when
resume
is called. This violates expectations of both throttler (tries to resume several suspendables in a simple loop) and task scheduler (expect no context switching when handling termination hooks).Implementation of
ISuspendable
should instead mark itself as resumed but delay any actual processing until control is returned to the event loop.