andychu commented 5 years ago

This issue is a continuation of #85 . Also related to #198 .

It should be in C because dealing with signals in Python makes it even more complicated. We now have a nice front end thanks to @drwilly, and the front end and back end are clearly separable.

https://ldpreload.com/blog/signalfd-is-useless (note retraction) comments are more useful
Another HN thread referencing that one

The issue being discussed there is signal coalescing. xargs needs wait for multiple signals at a time:

SIGCHLD -- because these can be coalesced, we still need to call os.wait() to known which processes finished. We could get one signal and two processes finished.
SIGUSR1 / SIGUSR2 -- What happens if these are coalesced? I think we can just treat it as one event? I can't think of a better way to do it. There is no equivalent of os.wait() for this information.
Probably: a timer signal. The original xargs doesn't have this, but I want an update every N seconds that says "4 processes running, 10 of 20 completed" or something like that.
- Even more ambitious: I like the screenshot here, we could do that with xargs: https://buck.build/
- (it also probably does not matter if this signal is coalesced)

Note that GNU xargs uses sig_atomic_t, which is just a typedef?

https://stackoverflow.com/questions/24931456/how-does-sig-atomic-t-actually-work

andychu commented 5 years ago

Notes/Questions:

why does signalfd() save a thread? Can't you do it with the self-pipe trick? We should avoid threads.
Is signalfd() totally portable?

andychu commented 5 years ago

Notes on GNU xargs:

The whole executionhappens in an atexit() handler! Weird.

  if (atexit (close_stdin) || atexit (wait_for_proc_all))                                                                                                                                                                                     
    {                                                                                                                                                                                                                                         
      error (EXIT_FAILURE, errno, _("The atexit library function failed"));                                                                                                                                                                   
    }

It uses the default SIGCHLD.

  /* Make sure to listen for the kids.  */                                                                                                                                                                                                    
  signal (SIGCHLD, SIG_DFL);

OSH uses the default too. So maybe we don't need an event loop model?

Maybe we can reuse waiter.Wait() in core/process.py ? It gives you a callback with (pid, status).

SIGUSR1 and SIGUSR2 just mutate global sig_atomic_t variables in the signal handler. If we want a "user interface" this might not be enough. We might want to do stuff in the main loop rather than in the signal handler.

  sigact.sa_handler = increment_proc_max;                                                                                                                                                                                                     
  sigemptyset(&sigact.sa_mask);                                                                                                                                                                                                               
  sigact.sa_flags = 0;                                                                                                                                                                                                                        
  if (0 != sigaction (SIGUSR1, &sigact, (struct sigaction *)NULL))                                                                                                                                                                            
          error (0, errno, _("Cannot set SIGUSR1 signal handler"));

andychu commented 5 years ago

Hm yeah I think this can all be done in Python -- SIGUSR1, SIGUSR2, timer with possible UI, and handling child exit. Waiter seems good enough to reuse.

It returns False once there are no more processes to wait for. As long as we check if there are remaining processes to run, that should be race-safe.

So the algorithm is:

Start P processes
Enter a loop with waiter.Wait(), which gets unblocked every time a child exits. Oops well we also have to get unblocked on SIGUSR{1,2}.
Exit when waiter.Wait() returns False (os.wait() returns errno.ECHILD) and there are no processes left to run.

Hm it does seem like we want the self-pipe trick... because we are waiting on multiple signals at a time. There are two conditions when you start a process:

When one finishes
When you get SIGUSR1

Does GNU xargs handle this correctly?

Also xargs writes the errno of exec() to a pipe from the child to the parent. I don't really understand why they need to do that!

                /* Write errno value to parent.  We do this even if                                                                                                                                                                           
                 * the error was not E2BIG, because we need to                                                                                                                                                                                
                 * distinguish successful execs from unsuccessful                                                                                                                                                                             
                 * ones.  The reason we need to do this is to know                                                                                                                                                                            
                 * whether to reap the child here (preventing the                                                                                                                                                                             
                 * exit status processing in wait_for_proc () from                                                                                                                                                                            
                 * changing the value of child_error) or leave it                                                                                                                                                                             
                 * for wait_for_proc () to handle.  We need to have                                                                                                                                                                           
                 * wait_for_proc () handle the exit values from the                                                                                                                                                                           
                 * utility if we run it, for POSIX compliance on the                                                                                                                                                                          
                 * handling of exit values.                                                                                                                                                                                                   
                 */                                                                                                                                                                                                                           
                write (fd[1], &errno, sizeof (int));

andychu commented 5 years ago

Hm actually I think waiter.Wait() needs *three return values instead of 2:

True -- callback for a process was called, caller should keep waiting
False -- we're done with everything
a third value -- we were interupted by another signal (wait() returns EINTR). Right now there is an internal continue in this case. That is, waiter.Wait() calls posix.wait() multiple times.

Although if we want the UI, it might be better to take SIGUSR{1,2} over a pipe. So then we could also write a character to the pipe from a different source. It's more modular.

andychu commented 3 years ago

862 hm this is somewhat of a dupe

oils-for-unix / oils

xargs -P could use Oil's core/process.py abstractions #326

862 hm this is somewhat of a dupe