open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.12k stars 856 forks source link

Document new "soft breakpoint' functionality #11692

Open qkoziol opened 1 year ago

qkoziol commented 1 year ago

The "Debugging Open MPI Parallel Applications" documentation section needs to be updated for the capability described in this commit: https://github.com/open-mpi/ompi/commit/f97d081cf9b540c5a79e00aecee17b25e8c123ad

qkoziol commented 1 year ago

From @rhc54:

" 'prterun —help' includes the following:

--stop-in-app Direct the specified processes to stop at an application-controlled location

We need to let you specify an argument that includes the string ID of the place to stop, so I need to do a little work there. Basically, the code OMPI needs to implement to let you stop in a designated place is the same as what is in ompi/runtime/ompi_rte.c starting at line 1115. You check for the OMPI_BREAKPOINT envar and check that against the ID of the point where you want to stop, and then check to see if that is the string that was provided. If so, then you generate an event with that breakpoint string so PRRTE knows you are ready, and wait for the debugger to attach.

I’ll work on the PRRTE side of things as time permits.

qkoziol commented 1 year ago

Once the PRRTE coding is finished, this capability needs to be tested and documented.

qkoziol commented 1 year ago

@samuelkgutierrez - Is this something you could work on also?

samuelkgutierrez commented 1 year ago

Hello, @qkoziol. I don't know how much help I would be. I'm unfamiliar with this particular feature, unfortunately.

rhc54 commented 1 year ago

As it requires code in OMPI to use it, I'm not sure how much value there is in documenting it. So far as I know, there is no corresponding OMPI code at this time

qkoziol commented 1 year ago

As it requires code in OMPI to use it, I'm not sure how much value there is in documenting it. So far as I know, there is no corresponding OMPI code at this time

I believe that the code is in OMPI already. If it turns out not to be true, it would definitely have to be done before I would document it. :-)

@jsquyres - Is the code in place?

bosilca commented 1 year ago

The code is in place already, check ompi_rte_breakpoint in ompi/runtime/ompi_rte.c

rhc54 commented 1 year ago

Sorry for not being clear - what I was trying to say was that the "soft breakpoint" only works if someone adds code specific for that breakpoint to OMPI. In other words, if I want to define a new "foo" breakpoint, then I have to add code to OMPI that implements it. Then that code can be activated via PMIx.

So I don't know if there is much value in including it in user-facing documentation. Certainly makes sense for developer docs, though. Not sure which you are writing, but I thought it was user-facing?

OMPI currently has code for the breakpoint in MPI_Init. That's a good example of how to do it.