Closed klaa97 closed 2 years ago
Hey @klaa97, there has been a short period of time where the mpiexec option was broken in PRTE.
Can you replicate with v5.0.0rc6
using configure --with-prte=internal
?
This is fixed in PRTE master, but has not yet been imported in either ompi/main or v5.0.x.
The following changes in PRTE need to be imported: https://github.com/openpmix/prrte/pull/1302
PR'd to the v2.1 branch here: https://github.com/openpmix/prrte/pull/1351
Will link the submodule update to v5 when we open that.
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
5.0.0rc4, 5.0.0rc5
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Distribution tarball for both versions
Please describe the system on which you are running
- Network type: tcp
Details of the problem
After building the tarball with
shell$ ./configure --with-ft=mpi shell$ make all install
I get the following output with any MPI file.
shell$ mpirun --with-ft ulfm ./hello_world # Segmentation fault (core dumped) shell$ mpirun --with-ft mpi ./hello_world # Segmentation fault (core dumped)
Note that I get this output even specifying a not existing file as the executable; this leads me to believe that the problem is in the schizo parsing of the MPI cli options. I did a little digging and I suspect the problem might be somewhere here: https://github.com/openpmix/prrte/blob/9ae73d4d97f843fac994103f2232f6570baaba26/src/mca/schizo/ompi/schizo_ompi.c#L394
Note also that if I manually specify the MCA options which are pushed in the code directly from the command line, the ULFM support seems to work.
Thank you!
your comment saved my b***
export OMPI_MCA_mpi_ft_enable=true
export PRTE_MCA_prte_enable_ft=1
that did the trick for me!
but am not sure why, but i am not able to set "np" flag to a low number for when i set it to a low number the program does not work well
is there a way to set it as an environmental variable?
thanx in advance :)
Update did set it using PRTE_MCA_prte_set_default_slots to wanted number and not using np flag but i still get a crash on low proc numbers as if the fault tolerance does not kick in
@awlauria @gpaulsen Did the recent v5.0.x submodule updates fix this issue?
I did not experience this issue with the latest v5.0.x fa738c5c
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
5.0.0rc4, 5.0.0rc5
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Distribution tarball for both versions
Please describe the system on which you are running
Details of the problem
After building the tarball with
I get the following output with any MPI file.
Note that I get this output even specifying a not existing file as the executable; this leads me to believe that the problem is in the schizo parsing of the MPI cli options. I did a little digging and I suspect the problem might be somewhere here: https://github.com/openpmix/prrte/blob/9ae73d4d97f843fac994103f2232f6570baaba26/src/mca/schizo/ompi/schizo_ompi.c#L394
Note also that if I manually specify the MCA options which are pushed in the code directly from the command line, the ULFM support seems to work.
Thank you!