Closed GeorgViehoever closed 7 years ago
Observed same issue on indi_v4l2_ccd and indi_mgen_autoguider on gentoo kernel 4.4 on Atom cpu.
Proposed fix, tested a minute ago:
...
/* wait for action */
s = select(maxfd + 1, &rs, &ws, NULL, NULL);
if (s < 0)
{
fprintf(stderr, "%s: select(%d): %s\n", indi_tstamp(NULL), maxfd + 1, strerror(errno));
// proposed fix for termination problem
if(errno!=EINTR)
{
Bye();
}
}
...
I've committed this as a temporary workaround, but this was not a problem before. I suspect reapZombie() might have introduced regressions leading to this. What do you think @TallFurryMan ?
I'll have a deeper look, but to me shutting down a socket which another thread is listening to will cause select to report a signal in that thread. The usual example is using alarm() to manage a timeout. I'd say EINTR has two meanings: either the driver crashed by itself or something made the connection close. However the situation may be slightly different with sockets and pipes?
In any case, I agree killing indiRun on connection interruption was not correct, but I don't see right now the relation with SIGCHLD.
I think the cause is SIGCHLD. According to https://linux.die.net/man/2/select, EINTR is set exactly when a signal was caught. I understand the purpose of interrupting system calls in this case is to enable programs to respond to such conditions, for instance choosing to abort, restart a child, release memory, or whatever...
Agreed. Before the implementation of reapZombies, all signals were ignored. Apparently by managing SIGCHLD, other signals cause an error code to be returned by select() et al. Thus the select() should simply be retried.
I presume f15b8397976d4bd68f266d78a3666325246de495 should resolve this issue as well?
No, this is unrelated. The fix you provided in the context of this issue does solve the problem described here.
To clarify, your temporary workaround IS the correct fix.
Running on Raspbian Jessie, compiled libindi from github sources. I observe indiserver crashing when trying to load/unload drivers using the FIFO mechanisms:
I think the root cause is that the select in 831 of indiserver.c does not handle errno EINTR properly. Georg