Closed rhclopes closed 4 weeks ago
That is so weird, but we have seen it before. Would be really nice to understand what causes it.
Just adding some notes from our debug session on this last week:
psconfig remote add
command. It correctly adds the URL to the file. The problem is that once the URL is added that the psconfig pscheduler agent does not create the tests with the aliased names@rhclopes do you happen to have /var/log/perfsonar/psconfig-pscheduler-agent.log from a time that the issue occurred. Specifically looking for a line like the following (except with your addresses):
2024-05-20 19:29:45 INFO pid=4153354 prog=_run_start line=104 guid=f0274777-95bf-4e68-a077-8da5cf5a8a20 pscheduler_assist_url=https://localhost/pscheduler match_addresses=["ps-dev-staging-el9-tk2.c.esnet-perfsonar.internal", "10.128.15.219", "fe80::fedc:61d7:480c:7ed4"] msg=Auto-detected match addresses
IIRC when you were screen sharing ps-slough-lat.perf.ja.net was in the match_addresses list, but wanted to be sure. Also, prior to modifying /etc/hosts, was their an entry for 194.81.18.229 and/or 2001:630:3c:f803::a already?
Andy,
The requested logs are attached.
I spent quite a few days looking into this problem. At some point I had the entries in /etc/hosts, removed, change to other values, and nothing was helping because there were other problems. Eventually, I closed on one problem left.: the DNS issue. I added the entries and changed switch,conf because I had seen a similar problem with perfsonar 5.0, traceroute (and my distributed storage) and I just followed the same solution.
Thanks! I think we are closer to the cause of the issue. It was detecting the name ps-slough-lat.ja.net instead of ps-slough-lat.perf.ja.net (the latter has .perf). Do you have any ideas where ps-slough-lat.ja.net might have been coming from? /etc/hosts maybe? At least currently I don't see that name in DNS.
The ps-slough-lat.ja.net is a previous name used for the interface, as was ps-slough-1g.ja.net. We were asked to move our network performance systems all under .perf.ja.net (which seemed quite reasonable) so that's the only name you should see now. I recall Raul did a complete re-install so that name shouldn't be held or cached anywhere in the system or its configuration. The problem seems to be around the aliasing in the reverse delegation?
This should be corrected now. We can re-open if surfaces again.
The command 'psconfig config remote add url' will fail to add a test if an ip address resolves to an alias. For example,
When we run
No latencybg tests are generated for the host ps-slough-lat.perf.ja.net.
The problem is fixed by adding the following lines to /etc/hosts
and prioritising hosts in /etc/nsswitch.conf.
BTW, I think that traceroute and tracepath will segfault in the presence of that alias, even if the alias is well defined.
Raul