perfsonar / project

The perfSONAR project's primary wiki and issue tracker.
Apache License 2.0
53 stars 10 forks source link

Issue with multiple NICs involving MCA/toolkit #1185

Open apertome opened 7 years ago

apertome commented 7 years ago

From Shawn:

Hi Andy, Michael,

We have a possible issue with the metadata exposed by the toolkit. Rolf Seuster (UVic) has setup a single server with two NICs to run both Latency and Bandwidth tests. He has assigned two different DNS names, one per NIC:

lcg-lat.sfu.computecanada.ca AND lcg-bw.sfu.computecanada.ca

However, the meshconfig at meshconfig.grid.iu.edu is only discovering one of them. We had to add lcg-lat.sfu.computecanada.ca as an "Adhoc" host: https://meshconfig.grid.iu.edu/#!/hosts/59c5446cab93a70020968d96

The other NIC is discovered correctly as a host in meshconfig: https://meshconfig.grid.iu.edu/#!/hosts/59c038d7c46fad23b11a96d3

Normally an adhoc host is overwritten once a registered host is found in the lookup service. (In this case the GOCDB sLS service)

Marian noted (see thread below) that the toolkit metadata show one external address and this may be confusing the meshconfig and preventing if from finding the other host?

If you look at the JSON info: http://lcg-lat.sfu.computecanada.ca/toolkit/services/host.cgi?method=get_summary The external_address block shows only one interface.

apertome commented 7 years ago

comment from Soichi:

I believe either gocdb2sls or mccache service from MCA needs to be updated so that both DNS endpoints can be entered to MCA via the single JSON info. But I could be wrong.. it's been a while since I looked at the code.

apertome commented 7 years ago

Looking more closely, the external_address field is intentionally one interface, it's whatever the toolkit has decided is the primary interface.

interfaces lists all interfaces.

I haven't been able to verify yet which of those the MCA is looking at.

Putting this under Project, as I'm not sure yet if the fix will involve the Toolkit, MCA, or both.

igarny commented 6 years ago

Part of my diagnostics on a different issue, I was able to register multiple addresses for a given service. check the "service-locator" param in http://ps-east.es.net:8090/lookup/service/8025cdb2-6e93-4743-97bf-690c30a9d3ee In fact I would like to advocate against it, since I am trying to specifically state which service operates on which interface. perfsonar/ls-registration-daemon/issues/44

ShawnMcKee commented 6 years ago

I think we should be able to specify exactly what we want and we may want to register multiple addresses for a given service. The details are below.

For example, the main configuration people have discussed is for a host with two NICs. One NIC can run latency and one NIC can run bandwidth. I guess more specifically one IP runs a latency service and a different IP runs bandwidth. To ensure the tests don't interfere we require that those two IPs are on different NICs.

What about hosts with 3 (or more NICs)? I can imagine that you may want to run a latency or bandwidth service on more than one NIC (IP). Two IPs for different latency meshes for example (LHCONE or LHCOPN or general IP).

All these cases should be configurable and work with our configuration infrastructure (Lookup service, mesh-configuration, MCA/GUI, etc)

To be explicit, imagine how to support the following:

A host with 4 NICs which will run both latency and bandwidth tests for both LHCONE and LHCOPN

We want to assign NIC1 and NIC2 for LHCOPN and NIC3 and NIC4 for LHCONE. The odd NICs (NIC1 and NIC3) will run latency tests, while the even NICs (NIC2 and NIC4) will run bandwidth. How do we configure the mesh to support this? How does the lookup service advertise ALL 4 interfaces?

igarny commented 6 years ago

Hi Shawn, At GEANT I was able to achieve this with pS 3.5.x through customized service registrations. pS 4.0 did not modify this behavior, but added another service (pscheduler). example: perfsonar/ls-registration-daemon/issues/44 Still a common mistake to consider the pscheduler registration for all sorts of tests. No! pScheduler is the coordination system and as such can/should operate on a separate interface. The test are organized based on the test specification for each service/test.

At GEANT I am registering the different services to specific service-locators (interfaces), but it appears there is a lack of understanding on the implementation of MCA. Check the conclusion of the devbeloper: perfsonar/meshconfig-admin/issues/34 . It appears the design was well thought and even the documentation did consider it.

A more appealing example of service separation is:

Shawn your example with 4NICs calls for another feature in LS registration. perfsonar/ls-registration-daemon/issues/17 (checkout the date).

Regards, Ivan