ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
269 stars 126 forks source link

DataMan transport identification by hostname #804

Closed stevenwalton closed 5 years ago

stevenwalton commented 6 years ago

Working on the super computers we don't have a static IP address which we can use to receive data when using DataMan. For example, if I log into Rhea I will be placed onto a node with address x.x.x.x. As I run the simulation my address will change to y.y.y.y and then later to z.z.z.z. They do load balancing by a round-robin method.

I think in many cases it would be much easier to provide a hostname to DataMan and have the local DNS resolve the address.

@JasonRuonanWang you can verify that hostnames do not work by providing the result of $ hostname in place of the IP address or trying localhost. I am guessing this is a feature of ZeroMQ.

JasonRuonanWang commented 6 years ago

I guess ZeroMQ will only take those host names that are in DNS, but not the ones in local host files. To adjust to this kind of dynamic IP address allocation scenarios I need to change a lot of things and do a lot of testing, which I don't think I can finish in a few days. Ideally in this kind of environments you may want to use RDMA based transports instead of TCP based ones. This is also the point why we have various transports. But RDMA based transports are currently only available in SST. Adding it to DataMan is not a trivial work either. But I can keep this issue open to remind myself of adding RDMA transports into DataMan

JasonRuonanWang commented 5 years ago

This is also solved in the WDM engine. And DataMan is not supposed to be run on a supercomputer any more. It becomes a pure wide area network engine and should always be run on a DTN or somewhere that has a static IP address.