sio2project / sio2jail

A tool for supervising execution of programs submitted in algorithmic competitions
MIT License
29 stars 10 forks source link

Sio2jail leaves zombie processes #33

Closed A-dead-pixel closed 1 year ago

A-dead-pixel commented 1 year ago

This can be observed (seems like at least on master and all sandboxed releases from 1.1.0 above) by:

  1. Compiling a simple executable, like main(){}
  2. Adding sth like this:

    --- a/src/sio2jail.cc
    +++ b/src/sio2jail.cc
    @@ -2,10 +2,14 @@
    #include "s2japp/Application.h"
    
    #include <iostream>
    +#include <sys/wait.h>
    
    int main(int argc, const char* argv[]) {
     try {
    -        return s2j::app::Application(argc, argv).main();
    +        int ret = s2j::app::Application(argc, argv).main();
    +        while (wait(NULL)>0)
    +            std::cerr << "bruh";
    +        return ret;
     }
     catch (const std::exception& ex) {
         std::cerr << "Exception occurred: " << ex.what() << std::endl;
  3. Building and running sio2jail with arbitrary (?) args, for example --mount-namespace off ~/a; adding -t 1 gives the same result
  4. Witnessing the bruh

Alternatively, one can add sleep(100) before the while and check htop or sth of this nature.

One way to fix this is executing the while from above silently, though that may be more of a workaround than an actual solution.

Wolf480pl commented 1 year ago

Is sio2jail reporting correctly the exit status of the child process?

If it's not, then this is a big issue and we'll need to look into the main loop around waitid to see what went wrong there.

But if it gets the exit code correctly, then that's sus, because usually one would expect that if sio2jail got the exit status, it did so by waiting for the process, and thus must've reaped it.

In principle, sio2jail should reap all of its direct children (of which there should be exactly one), but if it gets the exit code correctly, and leaves some zombies that get taken care of by init anyway, then I think it doesn't hurt much. And the waitid loop is quite unintuitive and hard to get right, I'd rather not change it unless necessary.

A-dead-pixel commented 1 year ago

As far as I know, the exit codes are good. I was surprised by it too, but one can get the code and leave the zombie by using WNOWAIT (which is used in 2 out of 3 waitid calls in the loop). TraceExecutor::onExecuteEvent also doesn't use that flag, but doesn't seem to enter the particulat if clause. If we were to change the waitid loop after all, I'd put add a non-WNOWAIT waitid around here: https://github.com/sio2project/sio2jail/blob/7be4e0e6df8322a54bfc4c8de87cd6317ccf9950/src/executor/Executor.cc#L160-L168

As for the init, for example in a docker container with sth like a sioworkers process instead of a proper init, the zombies will accumulate. Therefore we should at least add the cleanup loop somewhere.

Wolf480pl commented 1 year ago

As for the init, for example in a docker container with sth like a sioworkers process instead of a proper init, the zombies will accumulate.

Oh, ok... one way to solve it would be to wrap sioworkers in some sort of "minimal init" - I've seen some docker containers do a similar thing, and technically that'd be the correct thing to do. But most people making docker images won't do it...

I think if we could easily fix sio2jail to not leave zombies that'd be a good idea. But we can't add it the place you suggested. The break; in line 168 leads past the while loop and into code that calls the onPostExecute event: https://github.com/sio2project/sio2jail/blob/2239909ffd67fea628845e60c831d68ce1a00027/src/executor/Executor.cc#L190-L192

This event is used by a couple listeners to read the final measurements from the dying process: https://github.com/sio2project/sio2jail/blob/95d9e34b3ab52bc4dd02b9b9ae39882ef29f7632/src/perf/PerfListener.cc#L154-L158 https://github.com/sio2project/sio2jail/blob/369317c9bc242123d0d633cf639f8e77ec1f3b0e/src/limits/TimeLimitListener.cc#L95-L100 https://github.com/sio2project/sio2jail/blob/369317c9bc242123d0d633cf639f8e77ec1f3b0e/src/limits/TimeLimitListener.cc#L144-L145

So that non-WNOWAIT wait should be at the end of executeParent() IMO, after that onPostExecute loop.

And I think adding it there is unlikely to break anything, since the only thing after that is output dumping.