s417-lama / mpitx

Run MPI programs over tmux
MIT License
11 stars 1 forks source link

add support for MPMD (multiple program, multiple data) launch mode #2

Open guoyejun opened 1 year ago

guoyejun commented 1 year ago

the following command: mpiexec -genv WORLD_SIZE 2 ... \ -np 1 -host localhost ... -env RANK 0 final_cmd : \ -np 1 -host localhost ... -env RANK 1 final_cmd

is converted into: mpitx -genv WORLD_SIZE 2 ... -- \ -np 1 -host localhost ... -env RANK 0 final_cmd : \ -np 1 -host localhost ... -env RANK 1 final_cmd

and finally calls: mpiexec -genv WORLD_SIZE 2 ... \ -np 1 -host localhost ... -env RANK 0 mpitx "mpitx_child" final_cmd : \ -np 1 -host localhost ... -env RANK 1 mpitx "mpitx_child" final_cmd

s417-lama commented 1 year ago

I really appreciate your efforts put into mpitx.

I have a concern about the usage of the separator --. This separator is commonly used to mark the end of the options. However, in your usage, it is used to separate between the global options and local options, which might be a little confusing.

I think -- should be used to separate between the local options and each command, which looks like this:

mpitx -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 -- final_cmd : \ -np 1 -host localhost ... -env RANK 1 -- final_cmd

Then, you will not need to perform ad-hoc parsing for -env or other options, which are specific to an MPI implementation.

I think we don't have to separate between global and local options by our own, if the separator -- is always given by the user. The above command can be easily converted to the following form, I guess:

mpiexec -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 mpitx "mpitx_child" final_cmd : \ -np 1 -host localhost ... -env RANK 1 mpitx "mpitx_child" final_cmd
guoyejun commented 1 year ago

thanks for the comment, one concern is that there might be tens, hundreds or possible thousands of 'final_cmd' in the command line, it is not easy for the user to add '--' for each of them.

'--' can be considered as the separator of 'mpiexec global part' and 'programs part'.

s417-lama commented 1 year ago

I understand your concern, but I don't think it is a good idea to break the standard meaning of --.

According to man bash,

A -- signals the end of options and disables further option processing. Any arguments after the -- are treated as filenames and arguments.

Another problem is that, if we do not use -- to separate local options and commands, we end up with writing a full option parser by ourselves. You already wrote a parser for -env or other commands, but this is specific to one MPI implementation. The option name and number of arguments depend on specific MPI implementations (e.g., Open MPI uses -x foo=bar). To cover all cases, we would have to write a full option parser for all MPI implementations.

This is why mpitx mandates the user to separate the options and commands by -- in the first place. This makes things very simple.

Another benefit in using -- as a separator between local options and commands is that, even if we replace mpitx with mpiexec, the command remains valid. For example, using the above example,

mpiexec -genv WORLD_SIZE 2 ...
-np 1 -host localhost ... -env RANK 0 -- final_cmd : \ -np 1 -host localhost ... -env RANK 1 -- final_cmd

is still a vaild command line. If we put -- between the global and local options, it will not be a valid command.

guoyejun commented 1 year ago

agree with the usage of --, while I'll keep the change locally for my specific case.

s417-lama commented 1 year ago

I totally understand. Thank you for your time and efforts.