indilib / indi

INDI Core Library Repository
https://www.indilib.org
GNU Lesser General Public License v2.1
375 stars 393 forks source link

indiserver fails on echo 'stop indi_canon_ccd'> IndiFIFO/myFIFO #301

Closed GeorgViehoever closed 7 years ago

GeorgViehoever commented 7 years ago

Running on Raspbian Jessie, compiled libindi from github sources. I observe indiserver crashing when trying to load/unload drivers using the FIFO mechanisms:

pi@raspberrypiAstro:~ $ indiserver -f IndiFIFO/myFIFO &[1] 10265
pi@raspberrypiAstro:~ $ 2017-07-16T12:24:10: startup: indiserver -f IndiFIFO/myFIFO 
pi@raspberrypiAstro:~ $ echo "start indi_canon_ccd"> IndiFIFO/myFIFO
pi@raspberrypiAstro:~ $ 2017-07-16T12:24:21: Driver indi_canon_ccd: Number of cameras detected: 1.
pi@raspberrypiAstro:~ $ echo 'stop indi_canon_ccd'> IndiFIFO/myFIFOdp->name: indi_canon_ccd - tDriver: indi_canon_ccd
name:  - dp->dev[0]: Canon DSLR EOS 80D
<delProperty device="Canon DSLR EOS 80D"/>
Child process 10266 died
2017-07-16T12:24:46: select(4): Interrupted system call
2017-07-16T12:24:46: good bye

I think the root cause is that the select in 831 of indiserver.c does not handle errno EINTR properly. Georg

TallFurryMan commented 7 years ago

Observed same issue on indi_v4l2_ccd and indi_mgen_autoguider on gentoo kernel 4.4 on Atom cpu.

GeorgViehoever commented 7 years ago

Proposed fix, tested a minute ago:

...
    /* wait for action */
    s = select(maxfd + 1, &rs, &ws, NULL, NULL);
    if (s < 0)
    {
        fprintf(stderr, "%s: select(%d): %s\n", indi_tstamp(NULL), maxfd + 1, strerror(errno));
    // proposed fix for termination problem
    if(errno!=EINTR)
    {
      Bye();
    }
    }
...
knro commented 7 years ago

I've committed this as a temporary workaround, but this was not a problem before. I suspect reapZombie() might have introduced regressions leading to this. What do you think @TallFurryMan ?

TallFurryMan commented 7 years ago

I'll have a deeper look, but to me shutting down a socket which another thread is listening to will cause select to report a signal in that thread. The usual example is using alarm() to manage a timeout. I'd say EINTR has two meanings: either the driver crashed by itself or something made the connection close. However the situation may be slightly different with sockets and pipes?

TallFurryMan commented 7 years ago

In any case, I agree killing indiRun on connection interruption was not correct, but I don't see right now the relation with SIGCHLD.

GeorgViehoever commented 7 years ago

I think the cause is SIGCHLD. According to https://linux.die.net/man/2/select, EINTR is set exactly when a signal was caught. I understand the purpose of interrupting system calls in this case is to enable programs to respond to such conditions, for instance choosing to abort, restart a child, release memory, or whatever...

TallFurryMan commented 7 years ago

Agreed. Before the implementation of reapZombies, all signals were ignored. Apparently by managing SIGCHLD, other signals cause an error code to be returned by select() et al. Thus the select() should simply be retried.

knro commented 7 years ago

I presume f15b8397976d4bd68f266d78a3666325246de495 should resolve this issue as well?

TallFurryMan commented 7 years ago

No, this is unrelated. The fix you provided in the context of this issue does solve the problem described here.

TallFurryMan commented 7 years ago

To clarify, your temporary workaround IS the correct fix.