tucker-altiscale / pdsh

Automatically exported from code.google.com/p/pdsh
GNU General Public License v2.0
0 stars 0 forks source link

Change fanout at runtime #65

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hi,

it would be great if it was possible to change the fanout (-f) number at 
runtime.

Sometimes I run some command on huge number of nodes using pdsh with quite 
small fanout and then I find out it would have been better to use a little 
higher fanout, but at that point it extremely sucks to stop it and re-run it 
because some jobs just keep running even if you kill pdsh and also you don't 
really want to kill those commands, because that may leave something in 
inconsistent state.

So my suggestion is this: make it possible to change the fanout at runtime

- when the fanout is increased, it is quite trivial
- when it is decreased, run the next job after the job count gets lower than 
the new fanout number (obviously)

I'm not familiar with the pdsh source code, so I don't know if you have some 
means to communicate with the running pdsh process - I'm guessing you don't, so 
I'd suggest making use of the unix signals. For example, you might use SIGUSR1 
to decrease the fanout and SIGUSR2 to increase it (by one). Or something like 
that.

Please let me know if you find this feature interesting. If you don't, please 
at least point me in the right direction (e.g. the specific source code part 
that needs to be modified) so that I can try to change it for myself.

Thanks,
David

Original issue reported on code.google.com by deewizzl...@gmail.com on 10 Jul 2014 at 8:46

GoogleCodeExporter commented 8 years ago
The feature sounds interesting enough, but I would worry about the interface. 

See src/pdsh/dsh.c:_handle_sigint() and _handle_sigtstp() for examples on how 
simple
signals are currently handled in pdsh (a single SIGINT displays state of active 
threads, another SIGINT within 1s cancels all threads, a SIGTSTP within 1s of 
SIGINT
cancels all pending threads)

You could experiment with SIGUSR1/2 signals that increment/decrement 
opt->fanout (you'd need to first somehow get access to `opt' globally in dsh.c) 
However, this means that you'd need to signal this pdsh thread from an external 
process, which isn't exactly a nice interface.

It would be nice to have the single Ctrl-C put pdsh into a "command mode" 
momentarily, where this and other features could be implement without relying on
signals. However, that might be a bit of work, and probably not worth it for 
pdsh
at this time.

Original comment by mark.gro...@gmail.com on 10 Jul 2014 at 1:59

GoogleCodeExporter commented 8 years ago
https://gist.github.com/dwatzke/96bb88d527acf279ecd2

This is obviously horribly hackish but it is a bare minimum needed to make my 
suggestion work. With this unholy patch, you can use SIGUSR1 to decrement 
fanout value and SIGUSR2 to increment it.

Original comment by deewizzl...@gmail.com on 12 Jul 2014 at 5:35

Attachments:

GoogleCodeExporter commented 8 years ago
I've modified the gist (the url above) so that after a fanout increase 
'pthread_cond_signal(&threadcount_cond)' is called and therefore a new thread 
is spawned immediately

Original comment by deewizzl...@gmail.com on 16 Jul 2014 at 7:18

GoogleCodeExporter commented 8 years ago
This is cool that you got the feature working. In its current form I'm not sure 
I'd be able to pull it into the main pdsh codebase however. What would work is 
if we somehow found a way to enhance the mode after a single Ctrl-C so that 
pdsh accepts new commands. For example you would hit Ctrl-C, then within 1s a 
`f' to change fanout, then a new fanout number... This would resolve the 
yuckiness with using signals.

Also, I'm pretty sure use of pthread_cond_signal() from a signal handler is not 
safe (if you are doing it that way) See:

 http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_broadcast.html

However, there may be some way to work around that with your example.

Original comment by mark.gro...@gmail.com on 16 Jul 2014 at 1:14