open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.18k stars 861 forks source link

Using MPI_T for monitoring #9260

Open jychoi-hpc opened 3 years ago

jychoi-hpc commented 3 years ago

Background information

I am trying to use MPI_T for performance monitoring.

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.1.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Built from the source tarball. I didn't give any option during configure. I just did configure --prefix=/dir/.

Please describe the system on which you are running


Details of the problem

I compiled test/monitoring/check_monitoring.c but got the following error on execution:

shell$  mpirun -np 4 --mca pml_monitoring_enable 2 ./check_monitoring
Cannot find monitoring MPI_Tool "pml_monitoring_messages_count" pvar, check that you have enabled the monitoring component.

It looks like I need to turn on something during configure and building but I cannot figure out. Is there any options I need to turn on to use MPI_T?

jsquyres commented 3 years ago

This was ultimately answered on the public mailing list -- sorry for the delay:

You need to enable the monitoring PML in order to get access to the pml_monitoring_messages_count MPI_T. For this you need to know what PML you are currently using and add monitoring to the pml MCA variable. As an example if you use ob1 you should add the following to your mpirun command "--mca pml ob1,monitoring".

jychoi-hpc commented 3 years ago

Not at all. Thank you for the checking. I also replied to the advice in the mailing list: https://www.mail-archive.com/users@lists.open-mpi.org/msg34569.html

I am still having the problem. There is a verbose output in the reply. I am still wondering how to turn on MPI_T. Does it depend on any prerequisite packages?

jsquyres commented 3 years ago

@bosilca I'm able to replicate the issue.

Per https://github.com/open-mpi/ompi/blob/v4.1.x/ompi/mca/common/monitoring/README, if I run with:

$ ls -l prof
total 8
drwxr-xr-x 2 jsquyres named 4096 Aug 19 05:12 ./
drwxr-xr-x 5 jsquyres named 4096 Aug 19 05:07 ../
$ mpirun -np 2 \
    --mca pml_monitoring_enable 2 \
    --mca pml_monitoring_enable_output 3 \
    --mca pml_monitoring_filename prof/output \
    ./monitoring_test |& tee out.txt
$ ls -l prof
total 16
drwxr-xr-x 2 jsquyres named 4096 Aug 19 05:12 ./
drwxr-xr-x 5 jsquyres named 4096 Aug 19 05:07 ../
-rw-r--r-- 1 jsquyres named 1118 Aug 19 05:12 output.0.prof
-rw-r--r-- 1 jsquyres named  900 Aug 19 05:12 output.1.prof

Then as you can see above, everything works fine. But if I use --with-mpit to tell monitoring_test to use MPI_T:

$ mpirun -np 2 --mca pml ob1,monitoring \
    --mca pml_monitoring_enable 2 \
    --mca pml_monitoring_enable_output 3 \
    --mca pml_monitoring_filename prof/output \
    ./monitoring_test --with-mpit
cannot find monitoring MPI_T "pml_monitoring_flush" pvar, check that you have monitoring pml
cannot find monitoring MPI_T "pml_monitoring_flush" pvar, check that you have monitoring pml
[savbu-usnic-a:07785] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2198
[savbu-usnic-a:07785] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[savbu-usnic-a:07785] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 73.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

I do see pml_monitoring_flush registered in https://github.com/open-mpi/ompi/blob/a39a051fd87e0eb7019496ed1aae50c6fc10d0cd/ompi/mca/common/monitoring/common_monitoring.c#L306-L313

I'm digging in to find out why it's not being found by the monitoring_test program.

Sidenote: I have not checked, but I assume the issue also occurs on master (and possibly 4.0.x?). I'll check once I've completed diagnosis on the v4.1.x branch.

jsquyres commented 3 years ago

@bosilca It looks like the monitoring PML is unloaded. Here's how I ran:

$ mpirun -np 2  \
    --mca pml_monitoring_enable 2 \
    --mca pml_monitoring_enable_output 3 \
    --mca pml_base_verbose 100 \
    --mca pml_monitoring_filename prof/output \
    ./monitoring_test --with-mpit

For simplicity, I show just the output from one of the 2 processes -- I put *** next to the relevant lines for ease of reading:

[savbu-usnic-a:11807] mca: base: components_register: registering framework pml components
[savbu-usnic-a:11807] mca: base: components_register: found loaded component v
[savbu-usnic-a:11807] mca: base: components_register: component v register function successful
[savbu-usnic-a:11807] mca: base: components_register: found loaded component ob1
[savbu-usnic-a:11807] mca: base: components_register: component ob1 register function successful
*** [savbu-usnic-a:11807] mca: base: components_register: found loaded component monitoring
*** [savbu-usnic-a:11807] mca: base: components_register: component monitoring register function successful
[savbu-usnic-a:11807] mca: base: components_register: found loaded component cm
[savbu-usnic-a:11807] mca: base: components_register: component cm register function successful
[savbu-usnic-a:11807] mca: base: components_open: opening pml components
[savbu-usnic-a:11807] mca: base: components_open: found loaded component v
[savbu-usnic-a:11807] mca: base: components_open: component v open function successful
[savbu-usnic-a:11807] mca: base: components_open: found loaded component ob1
[savbu-usnic-a:11807] mca: base: components_open: component ob1 open function successful
*** [savbu-usnic-a:11807] mca: base: components_open: found loaded component monitoring
*** [savbu-usnic-a:11807] mca: base: components_open: component monitoring open function successful
[savbu-usnic-a:11807] mca: base: components_open: found loaded component cm
[savbu-usnic-a:11807] mca: base: close: component cm closed
[savbu-usnic-a:11807] mca: base: close: unloading component cm
[savbu-usnic-a:11807] select: component v not in the include list
[savbu-usnic-a:11807] select: initializing pml component ob1
[savbu-usnic-a:11807] select: init returned priority 20
*** [savbu-usnic-a:11807] select: initializing pml component monitoring
*** [savbu-usnic-a:11807] select: init returned priority 0
[savbu-usnic-a:11807] selected ob1 best priority 20
[savbu-usnic-a:11807] select: component ob1 selected
*** [savbu-usnic-a:11807] select: component monitoring not selected / finalized
[savbu-usnic-a:11807] mca: base: close: component v closed
[savbu-usnic-a:11807] mca: base: close: unloading component v
*** [savbu-usnic-a:11807] mca: base: close: component monitoring closed
*** [savbu-usnic-a:11807] mca: base: close: unloading component monitoring
[savbu-usnic-a:11807] check:select: PML check not necessary on self

Since monitoring works when not using MPI_T, are you using some sort of trick to keep the monitoring PML resident even after ob1 unloads it?

Regardless, it looks like the MCA base PVAR registrations for the monitoring PML might be getting invalidated when ob1 unloads monitoring. Specifically:

(gdb) bt
#0  mca_base_pvar_get_internal (index=3, pvar=0x7ffffffe7a20, invalidok=false) at mca_base_pvar.c:359
#1  0x00002aaaab22085d in mca_base_pvar_find_by_name (full_name=0x402590 <flush_pvar_name> "pml_monitoring_flush", var_class=9, index=0x603928 <flush_pvar_idx>) at mca_base_pvar.c:104
#2  0x00002aaaaab95013 in PMPI_T_pvar_get_index (name=0x402590 <flush_pvar_name> "pml_monitoring_flush", var_class=9, pvar_index=0x603928 <flush_pvar_idx>) at ppvar_get_index.c:41
#3  0x00000000004017e5 in main (argc=2, argv=0x7fffffffc008) at monitoring_test.c:116
(gdb) p (*pvar)->flags
$10 = (MCA_BASE_PVAR_FLAG_IWG | MCA_BASE_PVAR_FLAG_INVALID)

Meaning: the pvar that it gets back when it looks up pml_monitoring_flush has the INVALID flag set on it. This ultimately causes MPI_T_pvar_get_index() to return !=MPI_SUCCESS in main(), and then the test program aborts.

Doing some other debugger diving, I can see that the pvar->flags value is simply MCA_BASE_PVAR_FLAG_IWG right after it is registered (i.e., no INVALID flag). I didn't dig deeper than that, but I'm guessing that the INVALID flag is added when the monitoring PML is unloaded...?

jsquyres commented 3 years ago

I am still having the problem. There is a verbose output in the reply. I am still wondering how to turn on MPI_T. Does it depend on any prerequisite packages?

@jychoi-hpc To answer your specific question: MPI_T functionality is available in Open MPI without needing to enable anything special (e.g., on the command line). Per the MPI spec, you can just call MPI_T_init_thread() and go from there. This issue might turn out to be a bug in either the monitoring PML or the test program. Let's see what @bosilca says (his group is the author/maintainer of the monitoring PML).

jsquyres commented 3 years ago

Tested/confirmed: the same issue appears in the v4.0.x and master (and therefore presumably in v5.0.x).

bosilca commented 3 years ago

I guess the last update on the MCA params automatically invalidate the MCA params for components that are closed. However, in this case the monitoring component is not closed, the close call is made but it will refuse to do so, and instead it will highjack the default PML interface, thus monitoring all PML calls.

syd1159774950 commented 1 year ago

@jychoi-hpc Hello, I'm experiencing the same problem as you. How did you solve it?