Closed junaruga closed 1 year ago
I am not sure if the following comment on the failing spec is related to this issue.
Tools like dig/host/nslookup explicitly use a DNS server, the system might have other ways to set up hostnames (like /etc/hosts
). Some other things you might try:
ping rubyspecdoesntexist.fallingsnow.net
perl -MSocket -MData::Dumper -E 'print Dumper(Socket::getaddrinfo("rubyspecdoesntexist.fallingsnow.net", "http"))'
Most Linux distros have a working Perl interpreter, that second script does something kind of similar to the Ruby spec.
Thank you for your suggestion! The results of the commands are different from my local. I can use this results to ask the admin on Equinix.
The result of ping
on the arm64 server is "0% packet loss". This is interesting.
$ ping rubyspecdoesntexist.fallingsnow.net
PING rubyspecdoesntexist.fallingsnow.net.DOMAINS (70.32.1.32) 56(84) bytes of data.
64 bytes from ip-70.32.1.32.hosted.by.gigenet.com (70.32.1.32): icmp_seq=1 ttl=56 time=29.0 ms
64 bytes from ip-70.32.1.32.hosted.by.gigenet.com (70.32.1.32): icmp_seq=2 ttl=56 time=29.0 ms
64 bytes from ip-70.32.1.32.hosted.by.gigenet.com (70.32.1.32): icmp_seq=3 ttl=56 time=29.0 ms
64 bytes from ip-70.32.1.32.hosted.by.gigenet.com (70.32.1.32): icmp_seq=4 ttl=56 time=29.2 ms
64 bytes from ip-70.32.1.32.hosted.by.gigenet.com (70.32.1.32): icmp_seq=5 ttl=56 time=29.0 ms
^C
--- rubyspecdoesntexist.fallingsnow.net.DOMAINS ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 28.954/29.020/29.152/0.072 ms
Also here is the result
$ perl -MSocket -MData::Dumper -E 'print Dumper(Socket::getaddrinfo("rubyspecdoesntexist.fallingsnow.net", "http"))'
$VAR1 = '';
$VAR2 = {
'canonname' => undef,
'addr' => 'PF ',
'family' => 2,
'socktype' => 1,
'protocol' => 6
};
Here are the results of the commands on my local.
$ ping rubyspecdoesntexist.fallingsnow.net
ping: rubyspecdoesntexist.fallingsnow.net: Name or service not known
$ perl -MSocket -MData::Dumper -E 'print Dumper(Socket::getaddrinfo("rubyspecdoesntexist.fallingsnow.net", "http"))'
$VAR1 = 'Name or service not known';
The results of the commands are different from my local. I can use this results to ask the admin on Equinix.
I am asking the Equinix support now.
I find it rather fascinating that you're getting different IP addresses. The first error and reproduction return 170.178.183.18
, ping resolves the hostname to 70.32.1.32
. The Perl script shows the packed IPv4 address, that one might be mangled.
Either way, this looks like an issue with your specific host, not with Ruby or Ruby-spec
Yeah that's interesting result.
By the way, the domain "fallingsnow.net" appeared by the commit https://github.com/ruby/spec/commit/cd7f3442c513cf48c82652fd18c5dd0927ac0b06 . The domain is managed by Evan Phoenix. He may know something about this DNS's behavior.
commit cd7f3442c513cf48c82652fd18c5dd0927ac0b06
Author: Evan Phoenix <ephoenix@engineyard.com>
Date: Tue Apr 13 11:11:25 2010 -0700
Use a controlled name for testing unknown hostnames
Some ISP intercept unknown domains under .com/.org/.net, so using
somerandomname.com can easily fail.
They don't seem to intercept unknown hostnames under normal domains
though, so I've change us to use rubyspecdoesntexist.fallingsnow.net
because I control fallingsnow.net and I know that there will never be a
DNS entry for this host.
It looks like fallingsnow.net
is no longer registered or something.
If you make a PR replacing fallingsnow.net
with ruby-lang.org
I think that would be fine.
But indeed, what kind of DNS lies to you like that? :D
It looks like
fallingsnow.net
is no longer registered or something.
I just checked the "fallingsnow.net" by whois service. And it seems it is registered. https://www.whois.com/whois/fallingsnow.net
Expires On: 2024-05-28
If you make a PR replacing
fallingsnow.net
withruby-lang.org
I think that would be fine.
This is a good idea because we can predict the behavior of the DNS response from rubyspecdoesntexist.ruby-lang.org
with our managed domain ruby-lang.org
. I will send PR for that.
But indeed, what kind of DNS lies to you like that? :D
I am also curious about the root cause of this behavior on the Arm64 server's DNS server. Anyway, I am still asking Equinix support, and I will share the updates here.
I am still investigating this issue. I got the response from Equnix support that they cannot reproduce this issue.
First, I got the following result today the ruby script returning a different IP address from when I reported previously.
$ ruby -e 'require "socket"; p IPSocket.getaddress("a.fallingsnow.net")'
"70.32.1.32"
Second, I see the name resolution is managed by systemd-resolved
service.
When stopping the service, the name resolution doesn't work as expected.
$ sudo systemctl stop systemd-resolved.service
$ ping -c 3 rubyspecdoesntexist.fallingsnow.net
ping: rubyspecdoesntexist.fallingsnow.net: Temporary failure in name resolution
$ perl -MSocket -MData::Dumper -E 'print Dumper(Socket::getaddrinfo("rubyspecdoesntexist.fallingsnow.net", "http"))'
$VAR1 = 'Temporary failure in name resolution';
So, I edited the /lib/systemd/system/systemd-resolved.service
file and restarted the service to capture the debugging log by the journalctl
.
ruby1|aarch64$ diff -u systemd-resolved.service.orig systemd-resolved.service
--- systemd-resolved.service.orig 2023-03-20 14:32:08.000000000 +0000
+++ systemd-resolved.service 2023-10-17 15:24:06.617137254 +0000
@@ -52,6 +52,7 @@
Type=notify
User=systemd-resolve
WatchdogSec=3min
+Environment=SYSTEMD_LOG_LEVEL=debug
[Install]
WantedBy=multi-user.target
$ sudo systemctl stop systemd-resolved.service
$ sudo systemctl daemon-reload
$ sudo systemctl start systemd-resolved.service
Then I captured the log by the following command.
$ sudo journalctl -u systemd-resolved -f
while running the following ping
command for the name resolution.
$ ping -c 1 rubyspecdoesntexist.fallingsnow.net
PING rubyspecdoesntexist.fallingsnow.net.DOMAINS (70.32.1.32) 56(84)
bytes of data.
64 bytes from ip-70.32.1.32.hosted.by.gigenet.com:
icmp_seq=1 ttl=56 time=29.1 ms
--- rubyspecdoesntexist.fallingsnow.net.DOMAINS ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 29.124/29.124/29.124/0.000 ms
The captured log is below. https://gist.github.com/junaruga/b5ea688469c176a16789bb19597bc985
And the following parts mean the name resolution info is cached on local?
Oct 17 15:28:58 ruby1 systemd-resolved[3712]: Got DNS stub UDP query packet for id 17592
Oct 17 15:28:58 ruby1 systemd-resolved[3712]: Looking up RR for 32.1.32.70.in-addr.arpa IN PTR.
Oct 17 15:28:58 ruby1 systemd-resolved[3712]: Positive cache hit for 32.1.32.70.in-addr.arpa IN PTR
The used systemd version for the systemd-resolved is below as a note.
$ dpkg -S /lib/systemd/system/systemd-resolved.service
systemd: /lib/systemd/system/systemd-resolved.service
$ dpkg -s systemd | grep ^Version
Version: 249.11-0ubuntu3.9
I am close to solve this issue.
The problem is the search DOMAINS
in the /etc/resolv.conf
.
$ cat /etc/resolv.conf | grep -v ^#
nameserver 127.0.0.53
options edns0 trust-ad
search DOMAINS
I modified the file /etc/resolv.conf
manually for a test like this as Equinix support suggested it.
$ diff -u resolv.conf.orig resolv.conf
--- resolv.conf.orig 2023-10-17 14:20:44.024000000 +0000
+++ resolv.conf 2023-10-18 09:08:58.754638292 +0000
@@ -20,4 +20,4 @@
nameserver 127.0.0.53
options edns0 trust-ad
-search DOMAINS
+search .
And the ping
, perl
and ruby
results look okay.
$ ping rubyspecdoesntexist.fallingsnow.net
ping: rubyspecdoesntexist.fallingsnow.net: Name or service not known
$ perl -MSocket -MData::Dumper -E 'print Dumper(Socket::getaddrinfo("rubyspecdoesntexist.fallingsnow.net", "http"))'
$VAR1 = 'Name or service not known';
$ ruby -e 'require "socket"; p IPSocket.getaddress("rubyspecdoesntexista.fallingsnow.net")'
-e:1:in `getaddress': getaddrinfo: Name or service not known (SocketError)
from -e:1:in `<main>'
But after restarting the systemd-resolved service, it seems the /etc/resolv.conf
is regenerated with the search DOMAINS
again.
Below is the resolvectl status
. The Link 4 (bond0) - DNS Domain: DOMAINS looks problem. I am looking for how to change the setting permanently from "DNS Domain: DOMAINS" to "DNS Domain: ." now.
$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: NNN.NNN.NNN.NNN (masking by myself)
DNS Servers: NNN.NNN.NNN.NNN NNN.NNN.NNN.NNN (masking by myself)
Fallback DNS Servers: NNN.NNN.NNN.NNN NNN.NNN.NNN.NNN (masking by myself)
Link 2 (enp1s0f0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 3 (enp1s0f1)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 4 (bond0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
DNS Domain: DOMAINS
The Equnix support told me that this issue looks the following Ubuntu bug. https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1978351
It looks like it's fixed upstream and Ubuntu just hasn't shipped a fix, so you could add something extremely hacky until they (or rather if they ever) do.
One hacky idea is to add ExecStartPost=/usr/bin/resolvectl domain bond0 ""
to systemd-resolved. Run sudo systemctl edit systemd-resolved
and put this in:
[Service]
ExecStartPost=/usr/bin/resolvectl domain bond0 ""
It'll drop the override config in /etc/systemd/system/systemd-resolved.service.d/override.conf
You can then restart systemd-resolved and check with resolvectl.
Thank you for suggesting the workaround!
I tried your suggested way.
$ sudo /usr/bin/resolvectl domain bond0 ""
$ sudo systemctl edit systemd-resolved
ruby1|aarch64$ cat /etc/systemd/system/systemd-resolved.service.d/override.conf
[Service]
ExecStartPost=/usr/bin/resolvectl domain bond0 ""ruby1|aarch64$
But unfortunately, after rebooting the OS, I see the failures on the booting.
[ OK ] Started ifup for bond0.
ifup@bond0.service
[ OK ] Started Network Configuration.
systemd-networkd.service
Starting Wait for Network to be Configured...
Starting Network Name Resolution...
[ OK ] Finished Wait for Network to be Configured.
systemd-networkd-wait-online.service
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[ OK ] Stopped Network Name Resolution.
Starting Network Name Resolution...
ifupdown-pre.service
[ OK ] Finished Helper to synchronize boot up for ifupdown.
Starting Raise network interfaces...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[ OK ] Stopped Network Name Resolution.
Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[ OK ] Stopped Network Name Resolution.
Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[ OK ] Stopped Network Name Resolution.
Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[ OK ] Stopped Network Name Resolution.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[ OK ] Reached target Host and Network Name Lookups.
[ OK ] Finished Raise network interfaces.
[ OK ] Reached target Network.
And the systemd-resolved
service failed to start with the message "systemd-resolved.service: Start request repeated too quickly.".
ruby1|aarch64$ sudo systemctl status systemd-resolved.service
× systemd-resolved.service - Network Name Resolution
Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/systemd-resolved.service.d
└─override.conf
Active: failed (Result: exit-code) since Wed 2023-10-18 17:41:43 UTC; 1min 17s ago
Docs: man:systemd-resolved.service(8)
man:org.freedesktop.resolve1(5)
https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
Process: 1886 ExecStart=/lib/systemd/systemd-resolved (code=exited, status=0/SUCCESS)
Process: 1891 ExecStartPost=/usr/bin/resolvectl domain bond0 (code=exited, status=1/FAILURE)
Main PID: 1886 (code=exited, status=0/SUCCESS)
Status: "Shutting down..."
CPU: 123ms
Oct 18 17:41:43 ruby1 systemd[1]: systemd-resolved.service: Scheduled restart job, restart counter is at 5.
Oct 18 17:41:43 ruby1 systemd[1]: Stopped Network Name Resolution.
Oct 18 17:41:43 ruby1 systemd[1]: systemd-resolved.service: Start request repeated too quickly.
Oct 18 17:41:43 ruby1 systemd[1]: systemd-resolved.service: Failed with result 'exit-code'.
Oct 18 17:41:43 ruby1 systemd[1]: Failed to start Network Name Resolution.
ruby1|aarch64$ resolvectl status
...
Link 4 (bond0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
DNS Domain: DOMAINS
I am seeing if I suppress this error by setting other systemd config items. Your suggestion is very helpful and welcome. I am not familiar with systemd things.
https://stackoverflow.com/questions/35452591/start-request-repeated-too-quickly
Ah, it looks like it's because systemd-resolved starts before bond0
appears (note the status=1/FAILURE on the ExecStartPost job in the systemctl status call) so my hacky plan needs adjustment. I don't know much about Ubuntu's networking setup, so you'll have to fiddle a bit, but the trick is making sure the job runs after the interface appears.
You will want to remove the override (another call to systemctl edit or just deleting the override file should work). You can then add a new unit with systemctl edit --force --full hacky.service
[Unit]
Description="Gross hack to work around Ubuntu's broken ifupdown scripts"
After=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/bin/resolvectl domain bond0 ""
[Install]
WantedBy=multi-user.target
And then enable it. It'll run after network-online.target which might be good enough. There's probably a unit running the ifupdown script so setting After=<whatever that service is>
might work too.
Edit: actually reading the logs more closely helps, I see a ifup@bond0.service
service mentioned, try setting the hacky.service (or whatever more helpful thing you name it) to run after that with After=ifup@bond0.service
, I think.
@jeremycline Thank you! I was able to fix the issue on your way! After rebooting OS, I don't see any error, and the /etc/resolv.conf
is set properly!
# systemctl edit --force --full systemd-resolved-hacky.service
I added the following content that is essentially same with your way. The After=network-online.target
worked.
# This service fixes the domain value by replacing the `search DOMAINS` with
# `search .` in /etc/resolv.conf.
# https://github.com/ruby/spec/issues/1095
# https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1978351
[Unit]
Description="Gross hack to work around Ubuntu's broken ifupdown scripts"
After=network-online.target
[Service]
Type=oneshot
# Fix the domain value.
# $ resolvectl status
# ...
# Link 4 (bond0)
# Current Scopes: none
# Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
# DNS Domain: DOMAINS
# ...
ExecStart=/usr/bin/resolvectl domain bond0 ""
[Install]
WantedBy=multi-user.target
$ sudo systemctl enable systemd-resolved-hacky.service
Created symlink /etc/systemd/system/multi-user.target.wants/systemd-resolved-hacky.service → /etc/systemd/system/systemd-resolved-hacky.service.
$ sudo systemctl start systemd-resolved-hacky.service
Then I tested with resolvectl status
.
$ sudo reboot
$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: NNN.NNN.NNN.NNN (masking by myself)
DNS Servers: NNN.NNN.NNN.NNN NNN.NNN.NNN.NNN (masking by myself)
Fallback DNS Servers: NNN.NNN.NNN.NNN NNN.NNN.NNN.NNN (masking by myself)
Link 2 (enp1s0f0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 3 (enp1s0f1)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 4 (bond0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
$ grep -v ^# /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
search .
$ sudo systemctl status systemd-resolved-hacky.service
○ systemd-resolved-hacky.service - "Gross hack to work around Ubuntu's broken ifupdown scripts"
Loaded: loaded (/etc/systemd/system/systemd-resolved-hacky.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2023-10-19 10:21:53 UTC; 11min ago
Process: 2105 ExecStart=/usr/bin/resolvectl domain bond0 (code=exited, status=0/SUCCESS)
Main PID: 2105 (code=exited, status=0/SUCCESS)
CPU: 11ms
Oct 19 10:21:53 ruby1 systemd[1]: Starting "Gross hack to work around Ubuntu's broken ifupdown scripts"...
Oct 19 10:21:53 ruby1 systemd[1]: systemd-resolved-hacky.service: Deactivated successfully.
Oct 19 10:21:53 ruby1 systemd[1]: Finished "Gross hack to work around Ubuntu's broken ifupdown scripts".
And the following ruby command fails as error as expected now.
$ ruby -e 'require "socket"; p IPSocket.getaddress("a.fallingsnow.net")'
-e:1:in `getaddress': getaddrinfo: Name or service not known (SocketError)
from -e:1:in `<main>'
Thank you! The issue was fixed.
I also reported this workaround on the Ubuntu bug ticket. https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1978351/comments/13
Hello,
I am seeing a weird behavior about the
IPSocket.getaddress
on RubyCI's arm64 Ubuntu jammy server "arm64-neoverse-n1". This server is not visible on rubyci.org site yet.The spec file
library/socket/ipsocket/getaddress_spec.rb
is failing in themake test-spec
.https://rubyci.s3.amazonaws.com/arm64-neoverse-n1/ruby-master/recent.html https://rubyci.s3.amazonaws.com/arm64-neoverse-n1/ruby-master/log/20231013T130005Z.fail.html.gz
I can reproduce this failure on the ruby/spec latest commit 59bdcb4ea95c60159bb2bfc8c73022364da8ec0d too with the relatively latest master branch ruby
511571b5ff3aaab3ac013edc166a1bcf61f6d6d4
by the following command.A minimal reproducer
On the arm64 server
Seeing the
library/socket/ipsocket/getaddress_spec.rb
, the following command is expected to raiseSocketError
. However, it returns the IP address170.178.183.18
for the host "rubyspecdoesntexist.fallingsnow.net".It also returns the same IP for the different subdomain host.
It raises the
SocketError
as exepected when specifying our managed domain "rubyspecdoesntexist.ruby-lang.org".The following DNS client tool
nslookup
anddig
are not returning the domain right?So, do you know why this happened? What is the used domain "fallingsnow.net"?
As the arm64 sever is managed on Equinix Cloud, if the issue comes from the server's DNS server, I can ask the admin to correct the serer, if we can reproduce the issue with the general DNS client tool.
On my local Fedora Linux x86_64
Raising
SocketError
as expected.