nv-legate / legate.core

The Foundation for All Legate Libraries
https://docs.nvidia.com/legate/24.06/
Apache License 2.0
189 stars 63 forks source link

Driver seems broken for GASNet + UDP Conduit #294

Open bryevdv opened 2 years ago

bryevdv commented 2 years ago

If you build legate configured with GASNet + UPD conduit, then legate fails to start:

env38 ❯ legate
WARNING: Disabling control replication for interactive run
GASNet: Invalid number of nodes: --nocr
GASNet: Usage '/home/bryan/work/legate.core/_skbuild/linux-x86_64-3.8/cmake-build/_deps/legion-build/bin/legion_python <num_nodes> {program arguments}'

It seems that when built with this configuration, the legion_python executable expects to be invoked with the number of nodes as the first positional argument:

legion_python <nodes> script.py

But the legate.py driver does not do this, and results in the error above.

It seems that one way to fix this would be to have legate.py change to invoke legion_python correctly when the UDP conduit is configured.

Alternatively, it seems a bit strange to have legion_python expect a positional arg like this. Perhaps it would make sense to to have legion_python to always work with a script passed first (changes would still be needed here as well).

cc @magnatelee @marcinz

lightsighter commented 2 years ago

This actually isn't Legion Python's fault. This is an artifact of how the GASNet UDP conduit bootstraps itself by rewriting argv (there's a reason that both MPI and GASNet take a char ***argv argument to their initialization methods). GASNet has a specialized launcher for its UDP conduit and if you don't use their launcher then it's initialization can get messed up (e.g. launchers like 'mpirun' and 'srun' are not really supported). I suspect we'll need to either use GASNet's special UDP launcher in this case, or monkey with the arguments ourselves to align with GASNet's conventions about where they expect to see certain arguments.

bryevdv commented 2 years ago

This actually isn't Legion Python's fault.

Right, I did not mean that legion_python itself was the root cause, only that whatever GASNet UDP does, the relevant observable result is that an invocation of legion_python <nodes> script.py is required (which is not what legate.py does).

I don't have enough context to know whether "fixing" legate.py is worth the effort. If a special launcher is required for UDP then I think it would be sufficient if legate.py simply detects when there is an incompatibility and emits an actionable error that specifically mentions that the special launcher is required.

lightsighter commented 2 years ago

I suspect that the answer here is going to be that we need to use a special launcher for the UDP conduit which is really finicky because it can't bootstrap itself with PMIx the way that most high-performance networking systems do. I'll quote from the GASNet UDP conduit documentation:

Choosing the udp-conduit spawn mechanism (GASNET_SPAWNFN):
---------------------------------------------------------

udp-conduit is very a portable conduit, requiring only a UNIX-like system with
a reasonable C/C++ compilier and a standard sockets-based TCP/IP stack (which
includes UDP support).  However, the one aspect that remains somewhat
site-specific is the means for spawning the GASNet job across the UDP-connected
worker nodes. Consequently, the primary user-visible configuration decision to
be make when installing/using udp-conduit is the spawning mechanism to use for
starting udp-conduit jobs. udp-conduit includes built-in support for several
spawning mechanisms (including a very portable ssh-based spawner), and an
extensibility option which allows you to plug in your own job spawning command,
if desired.

All udp-conduit jobs should be started from the console (or from wrapper
scripts such as tcrun, upcrun) using a command such as:

         $ ./a.out <num_nodes> [program args...]

where the first argument <num_nodes> is the number of GASNet nodes to spawn,
and any subsequent arguments will be passed to the GASNet client as argc/argv
upon return from gasnet_init().

The GASNET_SPAWNFN environment variable is used to tell udp-conduit which
mechanism to use for spawning the worker node processes for the job, and may be
one of the following values (some of which have additional related environment
variables):

* GASNET_SPAWNFN='L'  (localhost spawn)

Uses a standard UNIX fork/exec to spawn all the worker processes on the local
machine, and UDP traffic between the nodes is sent over the localhost loopback
interface (which usually bypasses network hardware, but not the kernel). Useful
for debugging and testing, but probably not of interest for production jobs.

* GASNET_SPAWNFN='S'  (ssh/rsh-based spawn, the default spawner)

Uses any command-line based ssh or rsh client to connect to worker nodes, which should
be running an ssh/rsh daemon (ie sshd).  Requires that users setup password-less
authentication to the worker nodes (eg. using RSA public-key-based
authentication and/or ssh-agent). Specifically, users need the ability to run a
command such as: "ssh machinename echo hi" and have that command execute on the
remote node without the need for typing any passwords. Finally, any network
firewalls present must be configured such that the worker nodes have the
ability to make TCP connections to the machine that executes the initial spawn
command (used for bootstrapping) and such that the worker nodes can send UDP 
packets to each other (used to implement GASNet communication).

See the ssh-agent tutorial here:  http://upc.lbl.gov/docs/user/sshagent.shtml
or the documentation for your ssh client/daemon for more information on setting
up secure, password-less authentication for ssh. rsh-based spawning is also
supported, although not generally recommended due to the inherent security
flaws in rsh-based authentication (although it may still be appropriate on 
physically secure private networks).

the ssh-based spawner also recognizes the following environment variables
(the SSH_SERVERS value provides the names of the worker nodes running sshd
and is required):

 option      default                   description
----------------------------------------------------------------
SSH_CMD      "ssh"                     ssh command to use. Can also be set to "rsh",
                                       or the name of any other remote shell spawner 
                       program/script with a similar interface.
SSH_SERVERS  none - must be provided   space-delimited list of server names 
                                       to pass to SSH_CMD, one per node, in order of
                       priority (trailing extra server names ignored)
                                   may specify DNS names or IP addresses
SSH_OPTIONS  ""                        additional options to give SSH_CMD client
SSH_REMOTE_PATH  current working dir.  the directory to use on remote machine
                                       must contain a copy of the udp-conduit a.out 
                                       executable to be started

(these variables may also optionally be prefixed with "AMUDP_" or "GASNET_")

So for example, one could use the ssh spawner to start a job for an a.out
executable linked against libgasnet-udp-*.a as follows (assuming a csh-like shell):

$ ssh node0 echo ssh is working
ssh is working
$ setenv GASNET_SPAWNFN 'S'
$ setenv GASNET_SSH_SERVERS 'node0 node1 node2 node3 node4 node5'
$ ./a.out 3 arg1 arg2 arg3
Hello from node0
Hello from node1
Hello from node2
$

* GASNET_SPAWNFN='C'  (custom spawner)

The custom spawner allows the user and/or site installer to provide a custom
command to be used for starting the worker processes across the worker nodes.
This provides spawning extensibility - the custom command can invoke any
arbitrary site-specific spawning command (for example to call to an OS-provided
spawner, a batch system, or a custom-written wrapper script that performs
whatever actions are necessary to start the job). The only required environment 
variable setting is CSPAWN_CMD, which provides to command to be invoked for 
performing the spawn, upon which the following substitutions are performed:

  %N => number of worker nodes requested
  %C => the command that should be run once by each worker node participating in the job
  %D => the current working directory
  %% => %
  %M => optional list of servers taken from CSPAWN_SERVERS (the first nproc are passed)
  %P => the program name as invoked (the orignal argv[0])

The custom command specified by CSPAWN_CMD is invoked exactly once at startup,
and is responsible for starting all the %N remote worker processes and having
them execute the command passed as %C, in a directory containing the a.out
udp-conduit executable. The worker processes then use information passed within 
%C to connect to the master process on the spawning console and bootstrap the job.

Note that any network firewalls present must be configured such that the worker
nodes have the ability to make TCP connections to the machine that executes the
initial spawn command (used for bootstrapping) and such that the worker nodes
can send UDP packets to each other (used to implement GASNet communication). 

The custom spawner recognizes the following environment variables:

 option           default    description
--------------------------------------------------
CSPAWN_CMD          none     the custom command to be called for spawning, 
                             after replacement of the patterns listed above
                 the command must result in %N invocations of the
                 %C command, once on each worker node

CSPAWN_SERVERS      none     space-delimited list of servers to use - 
                             only required if %M is used in CSPAWN_CMD

CSPAWN_ROUTE_OUTPUT  0       set this variable to request built-in stdout/stderr 
                             routing from worker processes to the console, if your 
                 CSPAWN_CMD doesn't automatically provide that capability.
                 Note GASNET_ROUTE_OUTPUT is ignored for this spawner.

(these variables may also optionally be prefixed with "AMUDP_" or "GASNET_")

So for example, one could use the custom spawner in conjunction with an mpirun
command in a mixed MPI+GASNet/udp-conduit executable to start a job for an a.out
executable as follows (assuming a csh-like shell):

$ setenv GASNET_SPAWNFN 'C'
$ setenv GASNET_CSPAWN_CMD 'mpirun -np %N %C'  
$ ./a.out 3 arg1 arg2 arg3
Hello from node0
Hello from node1
Hello from node2

Similarly, one can use the srun command in SLURM:

$ setenv GASNET_SPAWNFN 'C'
$ setenv GASNET_CSPAWN_CMD 'srun -n %N %C'  

https://gasnet.lbl.gov/dist/udp-conduit/README

bryevdv commented 2 years ago

Does it need to be supported at all? I only made the issue because I happened to run into it trying to do local dev, but if it is not actually useful in reality then perhaps just remove support for it from legate.py (with some thorough error messaging to provide guidance in case anyone tries).

manopapad commented 2 years ago

I think we can just remove udp as a choice for --conduit in install.py for now, and leave this issue in the backlog to properly support udp if someone really wants it.

lightsighter commented 2 years ago

Does it need to be supported at all? I think we can just remove udp as a choice for --conduit in install.py for now, and leave this issue in the backlog to properly support udp if someone really wants it.

I suspect that we will ultimately need it, although maybe not immediately. The UDP conduit is what you would use on a machine that only has basic ethernet between the nodes, e.g. like in a traditional cloud environment. While this is not really high performance, it is something that people might ultimately want to run on. If they have a higher performance network though then they should be using it and not UDP.

For CI and local machine runs though we should probably just be using the MPI conduit.

manopapad commented 2 years ago

Temporarily removed UDP conduit support in the install script https://github.com/nv-legate/legate.core/pull/305.