open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.14k stars 859 forks source link

Fortran/C PMPI profiling wrappers #3954

Open markalle opened 7 years ago

markalle commented 7 years ago

I found an old discussion here https://www.open-mpi.org/community/lists/devel/2015/08/17842.php that led to the current situation where libmpi_mpifh.so defines Fortran entrypoints like mpi_send() with PMPI_Send() as the C call doing the main work.

The topic is whether wrapping the C MPI calls should lead to automagic wrapping of the Fortran MPI calls. The standard allows but doesn't require it (the requirement is just that an MPI implementation document whether it happens or not). I'll cut-and-paste that part:

3. document the implementation of different language bindings of the MPI interface if
they are layered on top of each other, so that the profiler developer knows whether
she must implement the profile interface for each binding, or can economize by implementing
it only for the lowest level routines.
4. where the implementation of different language bindings is done through a layered
approach (e.g., the Fortran binding is a set of “wrapper” functions that call the C
implementation), ensure that these wrapper functions are separable from the rest of
the library.

As far as I know, item 4 is the only reason it's traditional to separate Fortran mpi_send() etc into their own libmpi_mpifh library instead of the main libmpi.

So currently if a libwrap.so just defines its own C entrypoints like MPI_Send() calling PMPI_Send(), that libwrap.so won't intercept a Fortran mpi_send(). (App -> mpi_send -> libmpi_mpifh.so -> PMPI_Send -> libmpi.so).

The former behavior where libmpi_mpifh.so defined mpi_send() and called MPI_Send() would have caused Fortran MPI to be wrapped for free (App -> mpi_send -> libmpi_mpifh.so -> MPI_Send -> libwrap.so -> PMPI_Send -> libmpi.so).

And to summarize the IPM-related issue from the old discussion, libipm.so wraps both C and Fortran MPI calls for itself: IPM defines MPI_Send() etc calling PMPI_Send() IPM defines mpi_send() etc calling pmpi_send() so that led to Fortran MPI calls being accounted for twice (App -> mpi_send -> libipm.so -> pmpi_send -> libmpi_mpifh.so -> MPI_Send -> libipm.so -> PMPI_Send -> libmpi.so).

I agree we have to support IPM without the double accounting, but I also think it's more user friendly to allow the automagic wrapping of Fortran MPI entrypoints.

There are several ways to support both situations, but I think the least intrusive would be at run-time with either an "if" statement or a big set of function pointers controlled by a single global MCA setting. The default could remain mpi_send->PMPI_Send, but for libwrap.so that would like automagic Fortran wrapping the option would exist.

Before I make a PR to do this, I wanted to open an issue for discussion. Does my run-time selection sound like an acceptable solution?

ggouaillardet commented 7 years ago

quoting @jsquyres

the only correct way to write a tool that intercepts Fortran MPI API calls is to write those interceptions in Fortran. [...] everyone always gets this point wrong, so I feel the need to keep pointing this out.

as far as i am concerned, i fully agree with Jeff. so i'd rather keep Open MPI the way it is, and update IPM so it correctly handles MPI implementations that wrap calls once or twice (this could be a configure option or be auto-detected)

markalle commented 7 years ago

From a purely MPI-standard compliance standpoint the current code is correct, but the previous code was correct too. The change was made to be nice to IPM, even though technically you could have just declared that IPM was wrong and that they have to support both situations with regard to language-wrapping interception.

The possibility of automatic language wrapping is part of the standard. Since it's optional I agree that profile writers have to support both situations. But there are a lot of unmaintained profile libraries floating around that users download and try to use.

I've seen more than a few profile libraries in each camp. As a convenience feature for users I'd like to support both styles.

jsquyres commented 7 years ago

@ggouaillardet is quoting me correctly: it is only possible to wrap the Fortran API calls in Fortran, for at least the following reasons:

  1. MPI_SIZEOF only exists in Fortran, and by definition, it must use multiple Fortran interfaces (i.e., "overloading", in C++ parlance).
  2. There are some Fortran calls where the MPI implementation must know that the call originated from Fortran (e.g., anything with a function pointer, such as MPI_COMM_CREATE_ERRHANDLER).
markalle commented 7 years ago

MPI_SIZEOF shoudn't matter. A profile writer doesn't have to wrap every MPI call, and the topic is just about whether wrapping a C call gives the corresponding Fortran call for free.

I agree the current MPI_Comm_create_errhandler() code wouldn't work though. Current code would be the awkward situation of saying "most" C calls wrap their corresponding Fortran call for free, except for this small subset that doesn't. I just looked up Platform's old code and ours did have mpi_comm_create_errhandler() calling MPI_Comm_create_errhandler() and then setting a flag to indicate it's from fortran. So it can be done, but I agree that example shows the current code isn't language-wrap friendly for 100% of the C calls.

Still, it's not that uncommon for a profile writer to only wrap C calls and to hope for Fortran for free, and it's not a crazy thing to support. You guys are acting like it's just a hack when profile writers do this, like they were relying on an unintended and undocumented side effect of how the MPI implementation happened to be written, but it's a documented part of the MPI standard.

Again, admittedly it's an optional part of the standard. But it's not uncommon for profile writers to want it.

jsquyres commented 7 years ago

Sure, I like free things, too. But it doesn't mean that they're always realistic. 😄

Let's not forget about the mpi and mpi_f08 Fortran modules, too. Are you proposing wrapping those in C, too? Wrapping typed choice buffers in C would be... difficult, especially when array slices and other exotic things are supported.

Attribute functions also need special treatment.

...the list continues.

Pretty soon you end up with a bunch of different groups of different kinds of wrappings, and you end up with a giant mess. Why not wrap a single way? That makes the long-term maintenance considerably simpler (yes, I admit that this is subjective).

Just because the standard says something is possible does not mean that a tool gets to rely on that behavior.

(and no, you're not allowed to say "this only matters for mpif.h" 😄 )

markalle commented 7 years ago

Wrapping Fortran 77 is realistic though, relatively easy, and 98% done already. Not sure about f08 etc, offhand I'd take the easy route and say that's a separate language binding and thus would be in that same optional category (and yeah, I'd opt out of that one solely because I'm not as familiar with it).

I haven't tried to parse out what the new Fortran features looks like in C. In general that's always been my preference, to use explicit control from C of what entrypoints exist and how they interpret all their arguments, rather than relying on a particular Fortran compiler and whatever it happens to do. This for example let us support all combinations of autodouble options at run-time by just deciding how to typecast the various incoming (void*) arguments in C.

In Fortran 77 things were simple but sometimes not completely standardized (for example if a call has two strings did the extra integer args representing their lengths go right after the strings or at the end of all args). On the one hand f08 is making Fortran way more complex, but I was hoping it was more standardized too and thus technically even more handleable from C than before.

So hypothetically if wrapping C MPI_Send() were able to cause automagic wrapping of not only the simple F77 mpi_send() but also whatever new-fortran f08 variants of MPI_Send_f08 that exist, would that feature be worthwhile?

carns commented 7 years ago

I can add commentary from another tool developer's perspective. Darshan (http://www.mcs.anl.gov/research/projects/darshan/) has always just wrapped the C API. That's been a boon for us in the past, because (with rare hiccups) it has allowed us to instrument I/O across platforms, MPI implementations, compilers, languages, and vendors with a single, heavily tested set of wrapper functions. We do not exhaustively wrap every MPI function, just enough to monitor I/O activity.

For my 2 cents, I'm not looking forward to duplicating all of the wrappers or trying to detect which method to use on different platforms, much less keeping them all tested and maintained.

Is there an example in the code base of implementing wrapper functions for C, C++, and the various Fortran variants? No one on my team is a Fortran developer so we have a learning curve if we want to consider adding robust Fortran wrappers.

Also (maybe this should be a separate ticket) but pending the outcome of this ticket, the language in the online FAQ is out of sync with the current OpenMPI design:

https://www.open-mpi.org/faq/?category=perftools#PMPI

jsquyres commented 7 years ago

@markalle and I finally got on a phone call to discuss this idea. Here's what we concluded (please correct anything I get wrong here):

  1. The general idea is to have an MCA parameter that allows flipping between having the mpif.h MPI interface functions call their underlying C counterparts with an MPI_ prefix or a PMPI_ prefix.
    1. We didn't talk about this, but I'd suggest leaving the default behavior the same as it is today: Fortran functions call underlying PMPI_ functions.
    2. Extra bonus points for having a configure CLI option to change the default value of the MCA parameter.
  2. By definition, there are a small number of Fortran functions where this scheme will not apply:
    1. Some mpif.h subroutines that do not call their underlying C MPI_ (or PMPI_) functions (e.g., the attribute functions).
    2. We need to audit the mpi-f08, mpi-ignore-tkr, and mpi-tkr module implementations and see if there are additional subroutines in these Fortran interfaces that do not call their corresponding mpif.h subroutines (meaning: I do not remember offhand if there are any or not -- some of the funtion-pointer-passing functions might be weird...?).
    3. Some Fortran MPI functions do not have C counterparts (e.g., MPISIZEOF, some of the MPI*_MEM functions immediately jump to mind).
  3. This scheme will likely be implemented thusly:
    1. All the ompi/mpi/fortran/mpif-h/*_f.c wrappers will use a function pointer to call the underlying MPI_/PMPI_ function.
    2. The values of these function pointers will be set (according to the value of the MCA param) in two general places:
      1. opal_init(). This will cover the cases where a C application calls MPI_INIT[_THREAD].
      2. Some checker that will be put at the top of mpif-h implementations of MPI_INIT[_THREAD] and the small number of subroutines that can be invoked before INIT (e.g., MPI_INITIALIZED and friends).
      3. NOTE: The thought occurs to me that it will be tricky to get the value of an MCA param at these (pre- and very-early-INIT-before-MCA-params-are-setup) times. TBD.
  4. We need to clearly document which Fortran interfaces are not covered by this scheme (i.e., won't be intercepted if a tool writer solely intercepts MPI C APIs and the MCA param has Fortran OMPI calls invoking their underlying C MPI_ counterparts).

FWIW: @carns, I stand by my original statement (from a few years ago) that to absolutely correctly intercept all MPI Fortran calls, [at least some of] your wrappers must be written in Fortran. That being said, the wrappers that you must write in Fortran may well be for MPI functionality that you don't care about wrapping (e.g., MPI_SIZEOF, etc.).

ggouaillardet commented 7 years ago

i'd like to suggest we use a MPI_C_CALL() macro a la MCA_PML_CALL() so we can do configure --with-fortran-call={mpi,pmpi,runtime}

bosilca commented 7 years ago

👍 on @ggouaillardet suggestion. This seems like a major, hardly necessary, change where most experts agree that this is not the right solution. If someone has time to waste on this, so be it, but in addition to a clean solution that does not affect the default build, I would like to see a long term maintenance plan as well.

carns commented 7 years ago

The MCA parameter (or hardwired calls to MPI_ functions) would be straightforward for Darshan since we primarily motivated by wrapper portability for a relatively small subset of functions. I understand there are other considerations beyond what Darshan needs, though :-)

I'll add a couple of comments based on the above discussion:

1) If we end up with a range of possibilities depending on how OpenMPI itself was configured, then it would be nice to have a way for tool developers to detect how OpenMPI was configured in this regard when external profiling tools are built later. Vendors might toggle options like this for their own reasons that we can't directly control.

2) If there are OpenMPI builds that do not allow C API wrappers on the Fortran bindings, then it would be helpful to have documentation (a small example would be fine) for how to write equivalent wrappers for each language binding. Even better (I know I'm asking for free ponies now, sorry) if those wrapper examples were part of the test suite so that there were verified profiling methods for each supported binding that we could check for changes over time. I'm not personally interested in exhaustive demonstration of profiling for every function or anything like that, just pragmatic examples (e.g., how to wrap MPI_Init()) for the supported bindings.

carns commented 7 years ago

One more comment here for people who might read this issue looking for ideas. Even in a stock OpenMPI 2.1.1 build, you can still intercept C functions (underneath Fortran bindings) using LDPRELOAD. In the Fortran case the bindings call PMPI functions that can be intercepted, in the C case you can catch the MPI_ symbols.

This has the same restrictions as pointed out above (not all Fortran API routines can be safely intercepted this way) but it is an option in cases where you just want limited profiling without writing actual Fortran wrappers. In Darshan we already intercept non-MPI APIs using LD_PRELOAD or --wrap anyway, so we have existing infrastructure in place to make this work. We are going to look into this approach.

jsquyres commented 7 years ago

@carns Intercepting PMPI_ functions is absolutely not recommended; there be dragons there (e.g., you're going even further outside of the standard).

IMHO: Wearing my "let's adhere to the standard!" hat (vs. my "vendors will do what customers want" hat), I should note that instead of spending time figuring out creative non-standard ways to avoid writing Fortran, it would be better to spend that time writing a few Fortran examples of how to do it properly and portably (and is arguably far less complicated than the mechanism proposed here for Open MPI to accommodate tools that do not adhere to the standard). I'm sorry that the MPI Fortran interception mechanism is a little complicated, but that's unfortunately the nature of the beast.

carns commented 7 years ago

@jsquyres are there some simple examples standard adherent MPI Fortran interception to use as reference? Regardless of what approach Darshan pursues I think this would be helpful to tool maintainers.

jsquyres commented 7 years ago

@carns Sort of. MPI-3.1 section 17.1.5 talks about the Fortran profiling interface (as a C programmer, think of the use mpi and use mpi_f08 Fortran modules as basically precompiled headers -- and just like in C, the implementation is [usually] elsewhere, such as in a .o file that gets slurped into a library). One thing that is nice about modern Fortran (!) is that they made great strides in defining interoperability with C. One of the things Fortran allows is specifying what the back-end symbol will be (vs. compiler-specific symbol munging, like is common with C++).

I'm telling you this because MPI therefore defined what the back-end Fortran symbols need to be named in an MPI implementation, explicitly so that they can be overridden by interposing libraries like Darshan. That's what the first part of MPI-3.1 17.1.5 is about.

At the end of 17.1.5 are some examples of how implementations can implement the mpi and mpi_f08 modules. And at the very end is one piddly little example of how you can write a Fortran routine to intercept (in the Advice to Users on page 616), that looks like this (let's see if Github syntax hilighting understands Fortran!):

! Show intercepting a subroutine from the mpi_f08 module
SUBROUTINE MPI_Isend_f08ts(buf,count,datatype,dest,tag,comm,request,ierror)
  USE :: mpi_f08, my_noname => MPI_Isend_f08ts
  TYPE(*), DIMENSION(..), ASYNCHRONOUS :: buf
  INTEGER, INTENT(IN) :: count, dest tag
  TYPE(MPI_Datatype), INTENT(IN) :: datatype
  TYPE(MPI_Comm), INTENT(IN) :: comm
  TYPE(MPI_Request),  INTENT(OUT) :: request
  INTEGER, OPTIONAL,  INTENT(OUT) :: ierror
  ! ... some code for the begin of profiling
  call PMPI_Isend (buf, count, datatype, dest, tag, comm, request, ierror)
  ! ... some code for the end of profiling
END SUBROUTINE MPI_Isend_f08ts

You can probably guess what most of that means (e.g., OPTIONAL is like a default parameter in C++, etc.). For a C programmer, probably the three most non-obvious things in there are:

  1. TYPE(*), DIMENSION(..): this is basically the Fortran equivalent of (void*) (Fortran purists will yell at me for saying that, because it's actually quite a bit more than (void*), but for the purposes of this conversation, that's good enough).
  2. ASYNCHRONOUS: means that this value may be changed even after the subroutine returns. Think of this in terms of non-blocking receives, and it makes a bit more sense.
  3. my_noname => MPI_Isend_f08ts: this is a "local name to 'use' name" mapping. This means that in this subroutine, you can use my_noname instead of MPI_Isend_f08ts, in case you want to call the "real" MPI_Isend_f08ts routine (you can do so via call my_noname(...). You probably won't need to do this, but the example showed it just to show that it can be done.

Sidenote: Other salient thing you need to know: in C, you can define complicated data types in a struct. In Fortran, the same kind of thing is called a "derived datatype" (not to be confused with MPI derived datatypes). In C, you access a struct member via a.b. In Fortran, you access a derived datatype member via a%b.

What most people would do, I think, would would be to have a simple Fortran wrapper subroutine for the interception point, and it calls a back-end C function to do the real work of whatever it is you want to do in the interception. You probably want to then return to Fortran and let the Fortran wrapper invoke the corresponding PMPI_ function, if possible -- that may well save you some complications. You therefore may end up with a pair of back-end C functions (e.g., a "pre" hook and a "post" hook).

Your code may end up looking something like this (this is uncompiled/untested -- YMMV):

/* Deliberately not including the user's buffer here -- it's a little more complicated if
   you want to get the user's buffer in C.  So for this example, I'm leaving it out.  Note, too
   that I'm using an MPI_Fint for all the MPI handles. */
void darshan_mpi_isend_intercept_pre(MPI_Fint count, MPI_Fint datatype, MPI_Fint dest, MPI_Fint tag, MPI_Fint comm, MPI_Fint request)
{
    MPI_Datatype datatype_c = MPI_Datatype_f2c(datatype);
    MPI_Comm comm_c = MPI_Comm_f2c(comm);
    MPI_Request request_c = MPI_Request_f2c(request);
    int count_c = (int) count;
    int dest_c = (int) dest;
    int tag_c = (int) tag;

    // ...whatever it is that your "pre" intercept function does ...
}

void darshan_mpi_isend_intercept_post(MPI_Fint count, MPI_Fint datatype, MPI_Fint dest, MPI_Fint tag, MPI_Fint comm, MPI_Fint request)
{
    // ...similar to above...
}

And in Fortran declarations (e.g., in a header file or module somewhere):

! Note that comments start with !
! And & at the end of a line is a continuation character

module darshan_intercepts

  ! This declaration refers to your C "pre" function
  subroutine darshan_mpi_isend_intercept_pre(count, datatype, dest, tag, comm, request) &
         bind(c, name="darshan_mpi_isend_intercept_pre") 
      INTEGER, INTENT(IN) :: count, datatype, dest, tag, comm, request
  end subroutine darshan_mpi_isend_intercept_pre

  ! This declaration refers to your C "post" function
  subroutine darshan_mpi_isend_intercept_post(count, datatype, dest, tag, comm, request) &
         bind(c, name="darshan_mpi_isend_intercept_post") 
      INTEGER, INTENT(IN) :: count, datatype, dest, tag, comm, request
  end subroutine darshan_mpi_isend_intercept_post

end module darshan_intercepts

! ------------------------------------------------------------

! This declaration refers to your Fortran wrapper routine (i.e., it must be implemented
! in Fortran)
SUBROUTINE MPI_Isend_f08ts(buf,count,datatype,dest,tag,comm,request,ierror)
  USE :: mpi_f08
  TYPE(*), DIMENSION(..), ASYNCHRONOUS :: buf
  INTEGER, INTENT(IN) :: count, dest tag
  TYPE(MPI_Datatype), INTENT(IN) :: datatype
  TYPE(MPI_Comm), INTENT(IN) :: comm
  TYPE(MPI_Request),  INTENT(OUT) :: request
  INTEGER, OPTIONAL,  INTENT(OUT) :: ierror
END SUBROUTINE MPI_Isend_f08ts

And then have a .f90 file somewhere that implements MPI_Isend_f08ts, perhaps something like this:

SUBROUTINE MPI_Isend_f08ts(buf,count,datatype,dest,tag,comm,request,ierror)
  USE :: darshan_intercepts
  USE :: mpi_f08, my_noname => MPI_Isend_f08ts
  TYPE(*), DIMENSION(..), ASYNCHRONOUS :: buf
  INTEGER, INTENT(IN) :: count, dest tag
  TYPE(MPI_Datatype), INTENT(IN) :: datatype
  TYPE(MPI_Comm), INTENT(IN) :: comm
  TYPE(MPI_Request),  INTENT(OUT) :: request
  INTEGER, OPTIONAL,  INTENT(OUT) :: ierror
  INTEGER :: local_ierr

  ! Call the back-end C "pre" hook
  ! Use "%MPI_VAL" to get to the MPI_VAL member of the Fortran derived datatypes
  ! for MPI handles to get to the mpif.h-style integer handle for Fortran MPI handles.
  ! For intercepting mpif.h and mpi module subroutines, this isn't necessary because
  ! the handles are already integers (only the mpi_f08 module has derived datatypes
  ! for MPI handles -- mpif.h and the mpi module use integers).
  call darshan_mpi_isend_intercept_pre(count, datatype%MPI_VAL, dest, tag, comm%MPI_VAL, request%MPI_VAL)

  ! Call the Fortran PMPI function
  call PMPI_Isend (buf, count, datatype, dest, tag, comm, request, local_ierr)

  ! If the PMPI function succeeded, call the back-end C "post" hook
  if (local_ierr .eq MPI_SUCCESS) then
      call darshan_mpi_isend_intercept_post(count, datatype%MPI_VAL, dest, tag, comm%MPI_VAL, request%MPI_VAL)
  endif

  ! If ierror was passed, fill it in
  if (present(ierror)) ierror = local_ierr
END SUBROUTINE MPI_Isend_f08ts

Most Fortran routines can be intercepted like this. However, Fortran unfortunately doesn't define how LOGICAL variables (i.e., booleans) are passed between Fortran and C for... a variety of reasons that are not interesting in this conversation.

See this lengthy comment in the Open MPI source code talking about exactly this case: https://github.com/open-mpi/ompi/blob/master/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h#L23-L162. If you don't care about intercepting any MPI functions with boolean parameters, this won't matter to you.