ofiwg / libfabric

Open Fabric Interfaces
http://libfabric.org/
Other
559 stars 376 forks source link

How to stringify -> unstringify an endpoint address? #2485

Closed jsquyres closed 7 years ago

jsquyres commented 7 years ago

Over on the Open MPI side of the world, @anandhis and @rhc54 have approached me with a use case that we can't seem to handle in libfabric yet. Namely:

  1. Open an endpoint in a "server" process
  2. Call fi_getname() on that endpoint
  3. String-ify the name obtained from fi_getname() and pass it as a CLI option to starting a remote client (e.g., if we're starting that remote client via ssh)
  4. Client obtains the remote name string via argv[n]
  5. Client converts the name string to either a (node,service) tuple or a binary address, and uses it as an input parameter to fi_getinfo() to get an appropriate provider to communicate with the server

On the server side, it's a little clunky, but we can pass the output of fi_getname() to fi_av_straddr() (we might need to make a dummy AV with the appropriate address type, but it's do-able). This gives us a string representation of the address.

...but how do we un-stringify the address on the client side? The goal is to be able to pass it to fi_getinfo() to select an appropriate provider with which to communicate with the server.

@bturrubiates and I discussed one unattractive method: the server could pass the address type to the client (it's just an int -- easy enough to pass via CLI). The client can then switch/case on that address type and do an appropriate string-to-binary conversion. But that's kinda icky, for at least two reasons:

  1. It's not immediately obvious how to un-stringify a non-SOCKADDR address.
  2. Open MPI would have to know / be able to react to all the FI_* address types. If Libfabric grows more address types over time, Open MPI will be in an awkward situation -- i.e., a new version of Libfabric may return an address type that an older version of Open MPI won't understand.

Any suggestions on how to handle this use case? Do we need a new API / flag / ... in libfabric to handle this use case?

j-xiong commented 7 years ago

The simplest way I can think of is a function that does the inverse of fi_av_straddr(): converting the string back to the address recognizable by the provider. Something like this:

int fi_av_addr_from_str(char string, void addr, int *addrlen)

jsquyres commented 7 years ago

This would be a new downcall into libfabric, and would likely need to rely on a provider to handle it (e.g., I would be against putting a GNI or PSMx convertor out in the core). That would imply that a provider name should be included in the string -- perhaps a URI of the form PROVIDER://ADDRESS.

shefty commented 7 years ago

fi_av_straddr requires an AV because the conversion is a provider specific function, so any reverse function would need that as well. I don't know that the address returned from fi_getname can necessarily be converted back into a simple <node, service> string pair.

Jianxin added a decent string format for the psm providers. Such a format could be adopted by other providers. We would need to agree on that format.

j-xiong commented 7 years ago

Having the provider name / address type name in the address string indeed sounds like a good idea. we could have addresses like "sock://<node>:<port>", "psm://<epid>", "psm2://<epid>:<vl>", "iface://<type>:<unit>:<port>".

jsquyres commented 7 years ago

Are there apps out there today that rely on the specific string output format of fi_av_straddr()? I.e., are we affecting backwards compatibility by changing the output format of those strings?

j-xiong commented 7 years ago

I am not aware of any. If such application exists today, it must have assumed socket address. For the sake of backward compatibility (if that is an issue), we could assume socket address when the address type / provider name is missing.

shefty commented 7 years ago

I'm not aware of any apps that rely on the output from fi_av_straddr.

In any case, I was considering adding a new call

fi_straddr(struct fid_fabric *fabric, blah blah)

in order to avoid needing to create an AV. (I can't think of a reasonable way to make this work with fi_tostr).

rhc54 commented 7 years ago

This sounds good to me. Just to further clarify the motivation: we are trying to make a significant reduction in the startup time for mpirun. When mpirun launches a daemon on a non-PMIx enabled system, the daemon has to "phone home" as that is the only contact point it can know, so mpirun passes that initial contact info on the cmd line.

Once the initial contact is established, we can do the usual "modex" data exchange for all the other connection info. Right now, we are limited to doing the phone home over TCP, which creates a bottleneck as the node where mpirun is executing must absorb all those socket connections. As a result, we have taken to launching daemons in "waves" that connect back to each other in a tree-like arrangement - but that also takes time.

If a fabric is available that can provide us with a reasonable string to put on the cmd line, then we could phone home over that fabric to make the initial connection. This should scale better. @anandhis has written the necessary integration to do this, but we need the string we can pass on the cmd line to the daemons.

We don't care what the string is, so long as it is short enough to fit on a typical cmd line, and we don't need one from every provider - if we are on a multi-provider system, we'll just select the "best" one to use for this purpose so the cmd line doesn't get overloaded. If we cannot get one at all, then we'll fallback to TCP and that system will take the launch performance "hit".

HTH explain things

shefty commented 7 years ago

Here are some options. All involve defining a common address string format:

"prov://field1:field2:field3..."

Option 1 fi_getnamestr(ep, char buf, size_t len);

The app retrieves the address of the EP directly as a string. The fi_getinfo() call remains unchanged and simply checks for the new address string format.

Option 2: fi_getname2(ep, void addr, size_t addrlen, uint64_t flags);

More generic version of fi_getname, with flags used to output either a string or binary address.

Option 3 Add FI_ADDR_STR - new address format (added to enum)

Use existing fi_getname to retrieve address in string form. Other calls also operate on string format: fi_setname, fi_getpeer, fi_av_insert, etc.

Option 4 fi_straddr(fabric, void addr, size_t addrlen, char buf, size_t *len);

Adds more generic conversion function.

Of these options (plus others mentioned in thread), I'm leaning to option 3. It seems the simplest and most robust.

j-xiong commented 7 years ago

For option 3, do we have a way to let the provider use more than one address format, i.e. let the string address be a supplement format instead of the only format?

shefty commented 7 years ago

I'm not sure I understand the question. This is what I currently have as a patch description:

Define FI_ADDR_STR.  This indicates that the application will
use a string format for all addresses.  The string format is
provider specific, but follows this format:

format://field1[:field2[:field3...]]

Format may be a well known format, such as "sock", or a provider
name, such as "psm".

The string format is always usable on input to fi_getinfo.  It may
also be usable in other API calls (fi_getname, fi_setname, etc.) if
the fi_info addr_format field is set to FI_ADDR_STR.

So a provider can use different string formats, but all formats are strings. There's no mixing of formats. This aligns with the current API.

Note that there's still this issue #2475 open that needs to be addressed, but I think it's a separate problem to resolve.

j-xiong commented 7 years ago

That actually answered my question. I was not sure if a provider can use a different format most of the time but use string format when needed. It is clear to me now.

rhc54 commented 7 years ago

Is there a concern here that this might become too restrictive? My original question wasn't intended to impose a sweeping mandate that all providers must have string addresses. I was only asking for a way to obtain a string-like address from those providers that can provide one. All other providers can still exchange binary address info during normal startup procedures.

Personally, I'd prefer a solution along those lines. Even if the current providers can all provide string addresses, I wouldn't want to claim that all future providers necessarily conform to that format.

shefty commented 7 years ago

We already have a call that requires all providers to convert their native address into some sort of string. It's just associated with an AV, which isn't friendly for some apps to use. (This is the original problem called out by Jeff.) Combining this requirement with the requirement that all providers need to parse the <node, service> parameters from fi_getinfo, and we're almost at option 3.

Where option 3 suffers is that an app cannot get both the low-level binary address and the string format for it without still going through the AV string call like we have today.

anandhis commented 7 years ago

Think the original request was to provide a way to convert the string back to a binary address (by the provider). Something like below added fi_strtoep(char buf,size_t len, ep) (or) fi_strtoav(char buf, size_t len, av)

The string format like you mentioned in option-3 is an extensive solution.

Thanks, Anandhi

From: Sean Hefty [mailto:notifications@github.com] Sent: Thursday, November 10, 2016 9:29 AM To: ofiwg/libfabric libfabric@noreply.github.com Cc: Jayakumar, Anandhi S anandhi.s.jayakumar@intel.com; Mention mention@noreply.github.com Subject: Re: [ofiwg/libfabric] How to stringify -> unstringify an endpoint address? (#2485)

We already have a call that requires all providers to convert their native address into some sort of string. It's just associated with an AV, which isn't friendly for some apps to use. (This is the original problem called out by Jeff.) Combining this requirement with the requirement that all providers need to parse the <node, service> parameters from fi_getinfo, and we're almost at option 3.

Where option 3 suffers is that an app cannot get both the low-level binary address and the string format for it without still going through the AV string call like we have today.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ofiwg/libfabric/issues/2485#issuecomment-259753519, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOTt9m70jYA-NPY5ApO3uwrLRyTlrRl_ks5q81RcgaJpZM4KitgD.

shefty commented 7 years ago

Think the original request was to provide a way to convert the string back to a binary address (by the provider). Something like below added fi_strtoep(char buf,size_t len, ep) (or) fi_strtoav(char buf, size_t len, av)

This is a proposed solution, not the actual problem/requirement as I understand it.

There are a couple of issues with the existing API. First, a binary address cannot generically be converted into a string without using an AV. This is mostly a nuisance to apps. Second, there is a disconnect between the address string and having the ability to use it with the fi_getinfo call. This is more challenging to fix. I believe that these are the core problems being exposed in the existing API.

rhc54 commented 7 years ago

Hi folks - I see @shefty referenced this a few days ago. Any prognosis and/or ETA?

shefty commented 7 years ago

The intent is to have a solution by the next release. We'll eventually discuss my proposal in an ofiwg call, possibly as soon as tomorrow (12/6), but it could come 1-2 weeks after as we work through all staged changes.

anandhis commented 7 years ago

Tried the fi_getinfo() call with fi_info->addr_format field set to FI_ADDR_STR, this fails with error: No data available. Looks like providers not implementing this yet ? Debug dump from ofi_getinfo() below Can you please confirm this is the case and provide eta for sockets, psm2 providers to support this address format ? thanks, Anandhi

libfabric:psm2:core:psmx2_getinfo():280 hints->addr_format=9, supported=0,10. libfabric:core:core:figetinfo():644 fi_getinfo: provider psm2 returned -61 (No data available) libfabric:UDP:core:util_getinfo():164 checking info libfabric:UDP:core:ofi_check_info():751 address format not supported libfabric:core:core:figetinfo():644 fi_getinfo: provider UDP returned -61 (No data available) libfabric:sockets:fabric:sock_verify_info():268 Unsupported address format libfabric:core:core:figetinfo():644 fi_getinfo: provider sockets returned -61 (No data available) libfabric:ofi-rxm:core:ofi_check_info():751 address format not supported libfabric:core:core:figetinfo():644 fi_getinfo: provider ofi-rxm returned -61 (No data available) [knl-60:04644] ../../../../../orte/mca/rml/ofi/rml_ofi_component.c:559: fi_getinfo failed: No data available [knl-60:04644] ../../../../../orte/mca/rml/ofi/rml_ofi_component.c:884 Failed to open any OFI Providers [knl-60:04644] [[47025,0],0] - Init did not open any Ofi endpoints, returning NULL

shefty commented 7 years ago

Correct, the providers are not yet supporting this. There are some core helper functions for this, but the providers are not yet using it.