ruby / resolv

A thread-aware DNS resolver library written in Ruby
Other
36 stars 28 forks source link

`resolv.conf` on macOS sometimes has interface suffix which breaks `resolve.rb`. #35

Open ioquatix opened 1 year ago

ioquatix commented 1 year ago

@ioquatix I just ran into this the other day and tried to run Resolv.getaddress on bin/rails c within the dev same environment as @trevorturk.

Loading development environment (Rails 7.0.4.3)
irb(main):001:0> Resolv.getaddress "google.com"
=> "74.125.136.100"

I can also help with repros if needed.

I put a binding.irb in the resolv.rb exception site and got this:

irb(#<Resolv::DNS::Requester::UnconnectedUDP:0x000000010685e970>):004:0> Addrinfo.ip(host).ip_address
/Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:4:in `ip': getaddrinfo: nodename nor servname provided, or not known (SocketError)
    from /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:4:in `sender'
    from <internal:prelude>:5:in `irb'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:770:in `sender'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:527:in `block in fetch_resource'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1126:in `block (3 levels) in resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1124:in `each'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1124:in `block (2 levels) in resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1123:in `each'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1123:in `block in resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1121:in `each'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1121:in `resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:521:in `fetch_resource'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:507:in `each_resource'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:402:in `each_address'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:116:in `block in each_address'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:115:in `each'
    ... 30 levels...
irb(#<Resolv::DNS::Requester::UnconnectedUDP:0x000000010685e970>):005:0> host
=> "fe80::887:c7ff:fe62:d64%en0"

I have a feeling something about this %en0 (network interface?!) in the ipv6 DNS hostname is not happy. Removing it works fine.

Addrinfo.ip("fe80::887:c7ff:fe62:d64").ip_address
=> "fe80::887:c7ff:fe62:d64"

Edit: I can also confirm that this %en0 seemingly gets added when I'm tethering to an AT&T device from macOS since it is listed as such in resolv.conf:

cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
nameserver fe80::887:c7ff:fe62:d64%en0
nameserver 172.20.10.1

But this isn't shown in the macOS networking settings if you look at DNS servers:

image

AFAIK en0 in macOS parlance is an identifier for Wi-Fi network interface as this shows:

$ networksetup -listallhardwareports | grep -C 2 en0

Hardware Port: Wi-Fi
Device: en0
Ethernet Address: f0:2f:5b:01:23:b8

It seems like Resolv is choking on this identifier when it likely should be entirely ignored. Might have to file a Ruby bug report for this.

Originally posted by @olivierlacan in https://github.com/socketry/async-http/issues/107#issuecomment-1522043280

ioquatix commented 1 year ago

I suspect there are two solutions possible:

  1. resolv.rb should ignore %interface suffix from resolv.conf
  2. Addrinfo.ip("...%interface") suffix should be ignored.

(2) feels more general.

ioquatix commented 1 year ago

@trevorturk @olivierlacan what versions of Ruby are you using?

ioquatix commented 1 year ago

I tested this on my Linux desktop.

My valid network interface:

2: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 58:11:22:be:55:02 brd ff:ff:ff:ff:ff:ff
    altname enp10s0
    inet 192.168.1.41/24 metric 1024 brd 192.168.1.255 scope global dynamic eno2
       valid_lft 563sec preferred_lft 563sec
    inet6 2406:e000:6833:a800:5a11:22ff:febe:5502/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 2591931sec preferred_lft 604731sec
    inet6 fe80::5a11:22ff:febe:5502/64 scope link 
       valid_lft forever preferred_lft forever

The results:

samuel@aiko ~/P/k/protocol-quic (main)> ruby -v -rsocket -e 'p Addrinfo.ip("fe80::887:c7ff:fe62:d64%eno2").ip_address'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
"fe80::887:c7ff:fe62:d64%eno2"
samuel@aiko ~/P/k/protocol-quic (main)> ruby -v -rsocket -e 'p Addrinfo.ip("fe80::887:c7ff:fe62:d64%eno123").ip_address'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
-e:1:in `ip': getaddrinfo: Name or service not known (SocketError)
    from -e:1:in `<main>'

It looks like the interface name, at least on Linux, must be correct.

ioquatix commented 1 year ago

On a Darwin based system, you can get the list of interfaces using ipconfig -a.

On my system, en0 is a valid interface.

samuel@sakura ~/D/k/protocol-quic (main)> ruby -v -rsocket -e 'p Addrinfo.ip("fe80::887:c7ff:fe62:d64%en0").ip_address'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
"fe80::887:c7ff:fe62:d64%en0"
samuel@sakura ~/D/k/protocol-quic (main)> ruby -v -rsocket -e 'p Addrinfo.ip("fe80::887:c7ff:fe62:d64%en12345").ip_address'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
"fe80::887:c7ff:fe62:d64"

So it does seem to work correctly. But unlike Linux, it also works even if the interface is invalid. The question is, why is this not working in resolve.rb?

olivierlacan commented 1 year ago

@ioquatix Reproduced this issue on 3.2.2.

ioquatix commented 1 year ago

@olivierlacan it seems like the problem is not just with Ruby, but something to do with the OS.

At the time you do it, is the interface listed in ipconfig -a?

trevorturk commented 1 year ago

I'm sorry to say that since switching from AT&T to Verizon the issue hasn't happened to me again, and I can't seem to reproduce now! (I guess that's good news in a way, but I'm sorry I can't reproduce...)

Here's the output from my terminal:

$ networksetup -listallhardwareports | grep -C 2 en0

Hardware Port: Wi-Fi
Device: en0
Ethernet Address: f0:2f:4b:06:b0:1e

...but the ipconfig -a command seems different for me and I'm not sure what I should be running:

ipconfig -a
usage: ipconfig <command> <args>
where <command> is one of waitall, getifaddr, ifcount, getoption, getiflist, getsummary, getpacket, getv6packet, getra, getdhcpduid, getdhcpiaid, set, setverbose
olivierlacan commented 1 year ago

@trevorturk Looks like ifconfig is the command on macOS, at least that's the one I remember using in the past:

At the time you do it, is the interface listed in ipconfig -a?

Tethering from macOS to iOS connected to AT&T, whose name servers are: fe80::887:c7ff:fe62:d64%en0 and 172.20.10.1 (in /etc/resolv.conf):

$ ifconfig -a | grep -C 1 en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    options=6463<RXCSUM,TXCSUM,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
    ether f0:2f:4b:01:23:b6
    inet6 fe80::1090:9f9d:2ebe:2d34%en0 prefixlen 64 secured scopeid 0xe
    inet 172.20.10.12 netmask 0xfffffff0 broadcast 172.20.10.15

For comparison's sake this is what a regular coffee shop Wi-Fi connection yields:

en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    options=6463<RXCSUM,TXCSUM,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
    ether f0:2f:4b:01:23:b6
    inet6 fe80::1090:9f9d:2ebe:2d34%en0 prefixlen 64 secured scopeid 0xe
    inet 10.255.206.97 netmask 0xfffffc00 broadcast 10.255.207.255

Interestingly the inet6 (IPV6 name server?) sticks from the AT&T tethered connection in this response despite being wholly gone from /etc/resolv.conf so there's likely some caching happening in ifconfig.

$ cat /etc/resolv.conf | grep -C 1 nameserver
search lan
nameserver 10.255.204.1

The ifconfig output persists across nameserver cache refreshes at the OS level with:

sudo dscacheutil -flushcache
sudo killall -HUP mDNSResponder

Those typically do the trick with sticky DNS configs on macOS.

olivierlacan commented 1 year ago

Relevant Ruby issues:

I added assert_match(Resolv::IPv6::Regex, "fe80::1090:9f9d:2ebe:2d34%en0", bug17112) to the regression test @jeremyevans added at the time and it passes as expected.

I think the issue involves Resolv's use of Addrinfo.ip(host).ip_address to figure out the request sender info in fetch_resource:

fetch_resource gets called by each_resource and theoretically there's a different path to use AAAA records for IPV6 but it seems like that breaks down the name server has a suffix (like the one AT&T is sending me).

In my case I'm definitely using that IPV6 branch:

Socket.ip_address_list.any? {|a| a.ipv6? && !a.ipv6_loopback? && !a.ipv6_linklocal? }
=> true

Crucially, this also returns true within an Async block.

Even more interestingly, if I do this manually using the documentation example in Resolv which specifically passes an IPV6 Resource to getresources it works fine outside of an Async block but breaks within one:

irb(main):027:1* Resolv::DNS.open do |dns|
irb(main):028:1*    ress = dns.getresources "google.com", Resolv::DNS::Resource::IN::AAAA
irb(main):029:1*    p ress.map(&:address)
irb(main):030:0>  end
[#<Resolv::IPv6 2607:f8b0:4006:821::200e>]
=> [#<Resolv::IPv6 2607:f8b0:4006:821::200e>]
irb(main):031:1* Async {
irb(main):032:2*   Resolv::DNS.open do |dns|
irb(main):033:2*     ress = dns.getresources "google.com", Resolv::DNS::Resource::IN::AAAA
irb(main):034:2*     p ress.map(&:address)
irb(main):035:1*   end
irb(main):036:0> }
   16m     warn: Async::Task [oid=0xa198c] [ec=0xa19a0] [pid=9331] [2023-04-26 14:58:11 -0400]
               | Task may have ended with unhandled exception.
               |   SocketError: getaddrinfo: nodename nor servname provided, or not known
               |   → /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:771 in `ip'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:771 in `sender'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:527 in `block in fetch_resource'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1128 in `block (3 levels) in resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1126 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1126 in `block (2 levels) in resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1125 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1125 in `block in resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1123 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1123 in `resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:521 in `fetch_resource'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:507 in `each_resource'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:498 in `getresources'
               |     (irb):33 in `block (2 levels) in <top (required)>'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:298 in `open'
               |     (irb):32 in `block in <top (required)>'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:158 in `block in run'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:310 in `block in schedule'
=> #<Async::Task:0x00000000000a198c>
olivierlacan commented 1 year ago

It looks to me like this commit introduced the bug I'm encountering, at least in conjunction with Async: https://github.com/ruby/resolv/commit/5c161804ddef0dcf3c230ae6e5b9be1185861797

Addrinfo does not handle suffixed IPV6 IPs within an Async:

Async { Addrinfo.ip("fe80::887:c7ff:fe62:d64%en0").ip_address }
               | Task may have ended with unhandled exception.
               |   SocketError: getaddrinfo: nodename nor servname provided, or not known
               |   → (irb):40 in `ip'
               |     (irb):40 in `block in <top (required)>'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:158 in `block in run'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:310 in `block in schedule'

Outside of Async, everything is fine:

Addrinfo.ip("fe80::887:c7ff:fe62:d64%en0").ip_address
=> "fe80::887:c7ff:fe62:d64%en0"
olivierlacan commented 1 year ago

Helpful context from Mastodon thread:

@olivierlacan oh! that's your link-local IPv6 configuration for SEcureNeighborDiscovery on the LAN. That nameserver is just a reference to the same interface (peep the fe80 local prefix and the en0 suffix). As long as inet6 is enabled, it'll generate that and starting with 10.12 they switched from stable addresses (with ff:fe and your mac address) to Cryptographically Generated Addresses. https://binblog.de/2017/09/21/ipv6-privacy-stable-addressing-roundup/ has a great summary blargh, /s/nameserver/address/g

ioquatix commented 1 year ago

Maybe it's the async resolver mechanism that has problems understanding the address.

cc @bruno-

bruno- commented 1 year ago

Hi, during the work on Addrinfo.getaddrinfo scheduler hook I did not address the scenario of interface suffix.

I'm still catching up on this thread.