gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
117 stars 26 forks source link

Allowing execution across machines #53

Closed bquistorff closed 7 years ago

bquistorff commented 7 years ago

On computer clusters that share a file system (e.g. NFS) and that allow key-based ssh, it should be possible to distribute work across different nodes in the cluster. Commands can be sent via ssh immediately (no password). This would be a bit of work, but I thought I'd outline it here in case anyone wanted to implement. The steps as I see them are:

  1. Allow a mechanism to list hostnames for each child process.
  2. Update the processor execution function to execute the command via ssh on the appropriate hostname.
  3. Update the processor waiting function to determine if a child process is still running.
  4. Update the processor kill function in case the parent wants to stop processing.
  5. Update numprocessor when hostnames are specified.
gvegayon commented 7 years ago

This would be very cool to implement! I've done this kind of stuff in R... the parallel package does it smoothly. ssh-key should work OK.

bquistorff commented 7 years ago

I've implemented this in the multi branch. It came out pretty well. You just specify a list of hostnames in setclusters and it will cycle through them for each task. If left blank, they all go to the local machine.

I've only tested this on a Linux cluster with NFS that I have access to. @gvegayon do you have access to a cluster to test it as well? I would like to test in another environment as I had to do some tricks since NFS file syncing across machines can be a bit slow and there may be similar issues.

Any reason that we'd want to slightly "hide" this feature from beginner users (like omitting it from the PDF) for a while until we get bugs sorted?

gvegayon commented 7 years ago

That sounds great!

I have a network with a couple of computers at work, so yes, I can try it out... One question, is it possible to use both local and remote machines at the same time?

I don't think we should hide this feature from users. Open source software. Perhaps update the license under which we are distributing this and put it more explicitly both in the .sthlp file and in the readme of the repo (MIT). I think it is better to announce it so that users can test and help us improve it :).

George G. Vega Yon +1 (626) 381 8171 http://cana.usc.edu/vegayon

On Tue, Jul 18, 2017 at 1:50 PM, Brian Quistorff notifications@github.com wrote:

I've implemented this in the multi branch. It came out pretty well. You just specify a list of hostnames in setclusters and it will cycle through them for each task. If left blank, they all go to the local machine.

I've only tested this on a Linux cluster with NFS that I have access to. @gvegayon https://github.com/gvegayon do you have access to a cluster to test it as well? I would like to test in another environment as I had to do some tricks since NFS file syncing across machines can be a bit slow and there may be similar issues.

Any reason that we'd want to slightly "hide" this feature from beginner users (like omitting it from the PDF) for a while until we get bugs sorted?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvegayon/parallel/issues/53#issuecomment-316193563, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2is9qsN81-Sb_snpcRp1hrPXTfzKHRks5sPRqLgaJpZM4OSPRG .

bquistorff commented 7 years ago

Super let me know if it works for you. It is possible to split across local and remote and even to change the proportion. Just do something like hostnames(localhost other2 other2 other3).

I've copied the MIT license from parallel.ado to a new LICENSE file in the root of the repo. If we want to add stuff to each distributed file then we could use the SPDX License List short identifiers, for example

/*
 * (C) Copyright 2014 <AUTHORS>
 *
 * SPDX-License-Identifier: MIT
 */
bquistorff commented 7 years ago

merged in.