solettaproject / soletta

Soletta Project is a framework for making IoT devices. With Soletta Project's libraries developers can easily write software for devices that control actuators/sensors and communicate using standard technologies. It enables adding smartness even on the smallest edge devices.
http://solettaproject.org
Apache License 2.0
226 stars 109 forks source link

Mainloop: Make signal handling contingent upon an API call #1566

Open gabrielschulhof opened 8 years ago

gabrielschulhof commented 8 years ago

Task Description

Soletta's default-on signal handling interferes with node.js signal handling. Signal handlers should only be touched as part of an API call (sol_mainloop_handle_signals() maybe).

barbieri commented 8 years ago

I'll take this one, I'm just back from PRC and after post-travel settles I'll review your comments about mainloop integration.

If we get that mainloop integration right, the posix "init" is not to be called, so no signal handler is to be installed by soletta.

Right now I already envision we miss some callbacks like to spawn threads (worker threads) and processes (fork_run), that must be fixed. However I will carefully review your comments and do some testing, maybe I can send a PR that includes a proposal for the node.js part as well.

Thanks for reporting.

barbieri commented 8 years ago

Today I did an initial investigation based on my assumptions and the my early comment above. To answer myself first:

Right now I already envision we miss some callbacks like to spawn threads (worker threads) and processes (fork_run), that must be fixed. However I will carefully review your comments and do some testing, maybe I can send a PR that includes a proposal for the node.js part as well.

Worker threads shouldn't need any extra work. Although we do call sigprocmask() to unblock signals, the libuv doesn't do anything special in https://github.com/joyent/libuv/blob/master/src/unix/thread.c#L66 Since we'd not be calling our mainloop_init(), we'd have no signals added to the mask set and the operation would be no-op. So far so good.

Fork-run is more-complex:

Then I'm inclined to ditch sol_platform_linux_fork_run() and add sol_spawn() similar to libuv's and glib's, with pointer in struct sol_mainloop_implementation. It will be a hell of a work, but doable. I want to have others to check the idea before I do it.

@gabrielschulhof reported in https://github.com/solettaproject/soletta/pull/1507#issuecomment-190962769 the following:

The fundamental problems I have encountered with trying to implement the soletta main loop via libuv are these:

  1. libuv does not support polling for hangup (though that may be changing), which is needed for at least the hostname platform monitor. Polling for error is no substitute, because I need to have at least read or write in order for libuv to poll for anything, including errors, and there's always data available for reading in /proc/sys/kernel/hostname, so I can't poll for read.

this seems to be fixed, so it will be a matter of requiring a newer version of nodejs. Moreover, the hostname handling is only used for people building with "micro-linux" platform, most of the users are expected to use systemd instead.

  1. libuv does not support adding child process watches for arbitrary pids. It only allows watching pids which were spawned using uv_spawn(), so we'd have to keep our waitpid() around.

It does, see uv_signal_start() in https://github.com/joyent/libuv/blob/master/src/unix/signal.c. This can be used to implement child_watch_add().

  1. node.js quits when the libuv main loop runs out of active and referenced handles. Thus, even if we were to solve the above two problems we would still have to be careful which handles we keep and which ones we dereference. For example, soletta creates 2(IIRC) handles as part of sol_init() which we would have to dereference in order to retain the expected behaviour whereby node.js quits when all the handles are gone. When I tried implementing the soletta main loop via libuv, I set a flag during sol_init() to indicate that all incoming handles be dereferenced, and unset it after sol_init() completed, to indicate that subsequent handles be kept as they were. This is also pretty hackish. Additionally, the fact that the expectation in soletta is that sol_quit() will stop the main loop even if there are active and referenced handles means that on the soletta side the code (whether ours or somebody else's non-node.js soletta-based code , such as maybe a module) may not take care to remove all handles it adds to the main loop, causing node.js to hang.

Since you set sol_mainloop_implementation before soletta starts, you will get all handles soletta creates via our API. You can keep them all in sol_ptr_vector and on quit() you walk that and clear all your handles (delete them from libuv, but you must keep the actual handle alive so users do not crash, NULL-ify the pointers).

What would be left are the handles from non-soletta, like other nodejs modules. Here we need to agree on a solution, and my proposal is to not force-quit non-soletta stuff. The reason is soletta is not the host, but the guest module in a nodejs environment, then it should cooperate with that host. Then calling sol_quit() would only mean soletta itself would quit, but not the nodejs application. If the nodejs is only soletta stuff, then it would have no further handles and would exit normally. Otherwise it would keep running.

gabrielschulhof commented 8 years ago

@gabrielschulhof reported in https://github.com/solettaproject/soletta/pull/1507#issuecomment-190962769 the following:

  1. libuv does not support adding child process watches for arbitrary pids. It only allows watching pids which were spawned using uv_spawn(), so we'd have to keep our waitpid() around.

It does, see uv_signal_start() in https://github.com/joyent/libuv/blob/master/src/unix/signal.c. This can be used to implement child_watch_add().

This is true in that I can attach to SIGCHILD via libuv. Still, I'd have to do a non-blocking waitpid() myself, which kinda breaks the encapsulation I get from libuv. It also likely forces me to implement thread-safe access to the list of child processes over which I have to iterate so, at that point, I'm duplicating a lot of work. That's why I claim that it's not supported.

  1. node.js quits when the libuv main loop runs out of active and referenced handles. Thus, even if we were to solve the above two problems we would still have to be careful which handles we keep and which ones we dereference. For example, soletta creates 2(IIRC) handles as part of sol_init() which we would have to dereference in order to retain the expected behaviour whereby node.js quits when all the handles are gone. When I tried implementing the soletta main loop via libuv, I set a flag during sol_init() to indicate that all incoming handles be dereferenced, and unset it after sol_init() completed, to indicate that subsequent handles be kept as they were. This is also pretty hackish. Additionally, the fact that the expectation in soletta is that sol_quit() will stop the main loop even if there are active and referenced handles means that on the soletta side the code (whether ours or somebody else's non-node.js soletta-based code , such as maybe a module) may not take care to remove all handles it adds to the main loop, causing node.js to hang.

Since you set sol_mainloop_implementation before soletta starts, you will get all handles soletta creates via our API. You can keep them all in sol_ptr_vector and on quit() you walk that and clear all your handles (delete them from libuv, but you must keep the actual handle alive so users do not crash, NULL-ify the pointers).

This also requires thread-safe access to the list of outstanding handles. I noticed that soletta creates at least two handles during sol_init() so those would have to be cleaned up the way sol_quit() currently (hopefully) cleans them up.

What would be left are the handles from non-soletta, like other nodejs modules. Here we need to agree on a solution, and my proposal is to not force-quit non-soletta stuff.

Since we're only talking about when node.js quits, this is actually not an issue, because we won't get to this point if those other nodejs modules have any libuv handles around.

The reason is soletta is not the host, but the guest module in a nodejs environment, then it should cooperate with that host. Then calling sol_quit() would only mean soletta itself would quit, but not the nodejs application. If the nodejs is only soletta stuff, then it would have no further handles and would exit normally. Otherwise it would keep running.

This is actually how the main loop integration works currently. The bindings decide when to fire up the soletta main loop via sol_run() and when to quit it via sol_quit(). So, the soletta main loop only runs while there's an outstanding Soletta handle that requires it.

This approach jives well with node.js, because while there's an outstanding handle, the process shouldn't quit. This way, we don't need to track anything that soletta produces.

This approach also preserves encapsulation. That is, the bindings use exclusively libuv and soletta public APIs to accomplish the integration, which saves us from having to be thread-safe and platform-specific.

The only downside, of course, is that we need to know which APIs need the main loop and which can function without it. Still, this seems to be the least invasive way of integrating the main loop.

The fact is that this approach has worked very well for all the hardware bindings as well as for the OIC bindings. The only remaining snag is that, sometimes, the signal handling doesn't work - hence this issue.

barbieri commented 8 years ago

I know it's working well, that's why I'm doing other stuff and postponing my changes to make the other option possible -- not blocking anything :-)

So keep going :-)

Btw, the child_watch should be gone when we introduce libuv-like spawn api. So nothing to worry there as well.

gabrielschulhof commented 8 years ago

The only remaining issue is that while

var soletta = require( "./lowlevel" );

process.on( "SIGINT", function() {
    console.log( "SIGINT" );
    process.exit( 0 );
} );

setInterval( function() {}, 1000 );

works correctly when I hit ^C in the terminal (i.e., causes the process to print "SIGINT" and exit),

process.on( "SIGINT", function() {
    console.log( "SIGINT" );
    process.exit( 0 );
} );

var soletta = require( "./lowlevel" );

setInterval( function() {}, 1000 );

does not work (i.e. hangs).

barbieri commented 8 years ago

This is expected because soletta's sol_init() will set sigaction, overriding what was done by node. The sol_mainloop_impl allows to skip that, but you're not using it yet

The simple work around is to call sigaction BEFORE calling sol_init, then you save the previous function, then you call sol_init, then reset node's sigactions. Not more than 20 LoC in C

gabrielschulhof commented 8 years ago

Wouldn't it make more sense for soletta to do exactly this as part of sol_init()? That is, to non-destructively add it's own signal handling, while chaining up to existing signal handlers. Play nice with existing signal handlers, I guess. If you agree that this is a good idea, I'd like to do this in soletta (src/lib), rather than the node.js bindings (bindings/nodejs).

barbieri commented 8 years ago

no, it shouldn't bubble these as it does all that is needed, for example on INT it will cal sol_quit().

signal handlers are a pain, that's why people are thinking on how to solve the BUS signal that is raised from mmaps becoming invalid, the libraries want to handle that and fix automatically, but they can't since it's not as easy as "bubble it to the previous signal".

The idea behind you being able to override the soletta main loop is that it would allow "guest soletta" to behave well, not trying to own the process like it does now in "host soletta" mode.

gabrielschulhof commented 8 years ago

I guess when the spawning stuff rolls around we can give the provide-a-uv-based-main-loop-implementation idea another go. Until then, I guess I can try to do that signal handler juggling you suggested earlier.

Another idea I've had which may or may not be worth pursuing is to give soletta an API at runtime for attaching signal handlers. That is, instead of considering sol_mainloop_impl_platform_init() to be platform-specific, we should consider it to be platform independent, but the functions it calls should be provided via a strucutre. Something like

struct signal_api {
  int add_signal(int signum, void (*handler)(int signum, void *context));
  int del_signal(int signum, void (*handler)(int signum, void *context));
};

which we could then implement using either POSIX functions (by default) or using uv_signal_t when running as part of node. That should also play nice.

Come to think of it, perhaps sol_init() need not be part of the main loop implementation but rather there should be function pointers in struct sol_mainloop_implementation for providing an implementation for adding/removing signal handlers and sol_init() should make use of them.

A more generic question I have is this: Why is it so important that sol_quit() be called on SIGINT and SIGTERM? I mean, I don't think any other library does that. At the very least the behaviour should be configurable at runtime.

barbieri commented 8 years ago

it's more like a convenience, EFL does that. Others simply don't handle and will exit uncleanly.

Since most soletta applications run-forever and they are not expected to install termination handlers, providing one by default helps development by ^C providing useful valgrind logs :-)