socketry / async-http

MIT License
298 stars 45 forks source link

SocketError: getaddrinfo with iOS Personal Hotspot #107

Closed trevorturk closed 2 months ago

trevorturk commented 1 year ago

Apologies in advance for this vague issue report, but I thought I should post something just in case future Googlers run into this issue, or maybe someone has an idea and/or some ideas about how to debug.

When I'm using macOS 12.6 while connected my iOS 16.0.2 iPhone's "Personal Hotspot" with Net::HTTP, I'm able to make HTTP requests like so:

uri = URI('http://example.com/')
=> #<URI::HTTP http://example.com/>
Net::HTTP.get_response(uri)
=> #<Net::HTTPOK 200 OK readbody=true>

However, when using Async::HTTP, I get an error:

internet = Async::HTTP::Internet.new
=> #<Async::HTTP::Internet:0x000000010c44eeb0 @clients={}, @options={}>
Async { internet.get("https://example.com") }
13.56s     warn: Async::Task [oid=0x16684] [ec=0x16698] [pid=78449] [2022-10-03 15:10:21 -0500]
               | Task may have ended with unhandled exception.
               |   SocketError: getaddrinfo: nodename nor servname provided, or not known
               |   → /Users/trevorturk/.rbenv/versions/3.1.2/lib/ruby/3.1.0/resolv.rb:770 in `ip'

I noticed my macOS DNS settings looked like so:

Screen Shot 2022-10-03 at 3 24 02 PM

...so I tried changing to use Cloudflare's 1.1.1.1 and then things worked as expected.

I'm happy enough with this setup since it appears to be working, but I wonder if other people might run into the issue going forward. If I can help testing/debugging somehow, please do let me know! Otherwise I may leave this issue open for a while and then close for future possible Googlers!

ioquatix commented 1 year ago

I think this is the same as https://github.com/socketry/async/issues/180 - can you confirm it's the IPv6 address causing problems? i.e. does it work with just 172.20.10.1 (similar to how you tested 1.1.1.1).

trevorturk commented 1 year ago

Thanks so much, sorry I missed that other issue. I tested and confirmed that 172.20.10.1 works. I noticed some debugging ideas in that other issue. Please let me know if there's anything you'd like me to test, or feel free to close this as a dupe. I'm subscribed to the other issue, so I'll see activity there as well. Thank you!

ioquatix commented 1 year ago

Can you try the IPv6 address just by itself and reproduce the failure? i.e. it's not the combination of two addresses that causes the issue but the presence of an IPv6 address that causes failure.

trevorturk commented 1 year ago

I tested with the following, doing my best to avoid typos, so we'd just be testing the auto-populated address:

Screen Shot 2022-10-03 at 5 54 40 PM

...and the issue occurred: SocketError: getaddrinfo: nodename nor servname provided, or not known was raised. I think that's what you meant to test, but let me know!

ioquatix commented 1 year ago

Are you able to try on the latest release of macOS (Ventura, 13.0)? I can't reproduce it locally.

trevorturk commented 1 year ago

I am still able to reproduce:

Screenshot 2022-10-31 at 10 30 27 AM

I'm also on the latest iOS version (not beta, but the latest release)

I'm not sure how to help reproduce, but please do let me know if you have any ideas, or if you'd like to screen share or anything like that!

ioquatix commented 1 year ago

Thanks, that's really helpful, at least I know it's something I should be able to reproduce.

ioquatix commented 1 year ago

Okay, here is my next question, are you able to reproduce the issue using a public IPv6 DNS resolver? e.g. CloudFlare has 2606:4700:4700::1111. If you can reproduce the issue with that public server that is very helpful for me because then I should be able to set up my own environment to match.

trevorturk commented 1 year ago

I'm sorry to say that it seems I cannot reproduce the issue with 2606:4700:4700::1111 sorry!

ioquatix commented 1 year ago

Ahhh so it's something specific to some resolvers and not others!

trevorturk commented 1 year ago

It occurred to me today that I should ask if there's anything I can provide in terms of a description of my local environment, or if perhaps it would be helpful for me to test with a Docker environment I could share. (I'm wondering if there might be something different with openssl etc.) Please do let me know if anything like that would be helpful!

ioquatix commented 1 year ago

The fact you can't reproduce it with some servers is helpful knowledge. What might be the next step is if I can some how log into your environment and poke around. But, let me try for a repro on my end.

olivierlacan commented 1 year ago

@ioquatix I just ran into this the other day and tried to run Resolv.getaddress on bin/rails c within the dev same environment as @trevorturk.

Loading development environment (Rails 7.0.4.3)
irb(main):001:0> Resolv.getaddress "google.com"
=> "74.125.136.100"

I can also help with repros if needed.

I put a binding.irb in the resolv.rb exception site and got this:

irb(#<Resolv::DNS::Requester::UnconnectedUDP:0x000000010685e970>):004:0> Addrinfo.ip(host).ip_address
/Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:4:in `ip': getaddrinfo: nodename nor servname provided, or not known (SocketError)
    from /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:4:in `sender'
    from <internal:prelude>:5:in `irb'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:770:in `sender'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:527:in `block in fetch_resource'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1126:in `block (3 levels) in resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1124:in `each'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1124:in `block (2 levels) in resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1123:in `each'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1123:in `block in resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1121:in `each'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:1121:in `resolv'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:521:in `fetch_resource'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:507:in `each_resource'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:402:in `each_address'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:116:in `block in each_address'
    from 3.2.2/lib/ruby/3.2.0/resolv.rb:115:in `each'
    ... 30 levels...
irb(#<Resolv::DNS::Requester::UnconnectedUDP:0x000000010685e970>):005:0> host
=> "fe80::887:c7ff:fe62:d64%en0"

I have a feeling something about this %en0 (network interface?!) in the ipv6 DNS hostname is not happy. Removing it works fine.

Addrinfo.ip("fe80::887:c7ff:fe62:d64").ip_address
=> "fe80::887:c7ff:fe62:d64"

Edit: I can also confirm that this %en0 seemingly gets added when I'm tethering to an AT&T device from macOS since it is listed as such in resolv.conf:

cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
nameserver fe80::887:c7ff:fe62:d64%en0
nameserver 172.20.10.1

But this isn't shown in the macOS networking settings if you look at DNS servers:

image

AFAIK en0 in macOS parlance is an identifier for Wi-Fi network interface as this shows:

$ networksetup -listallhardwareports | grep -C 2 en0

Hardware Port: Wi-Fi
Device: en0
Ethernet Address: f0:2f:5b:01:23:b8

It seems like Resolv is choking on this identifier when it likely should be entirely ignored. Might have to file a Ruby bug report for this.

ioquatix commented 1 year ago

I've created an issue in the resolv repo.

olivierlacan commented 1 year ago

As Trevor did, I've isolated this behavior to within Async exclusively although on a newer Ruby version (while tethering to a mobile device connected to a ISP/provider like AT&T which supplies ipv6 domain name servers with a suffix, like fe80::887:c7ff:fe62:d64%en0), so while it might be related to Resolv behavior I can only reproduce it within an Async context:

irb(main):022:0> ::Resolv.getaddresses("google.com")
=> ["142.251.35.238", "2607:f8b0:4008:815::200e"]
irb(main):023:0> Async { ::Resolv.getaddresses("google.com") }
    2m     warn: Async::Task [oid=0x3c398] [ec=0x3c3ac] [pid=4196] [2023-04-26 13:57:23 -0400]
               | Task may have ended with unhandled exception.
               |   SocketError: getaddrinfo: nodename nor servname provided, or not known
               |   → /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:771 in `ip'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:771 in `sender'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:527 in `block in fetch_resource'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1128 in `block (3 levels) in resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1126 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1126 in `block (2 levels) in resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1125 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1125 in `block in resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1123 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:1123 in `resolv'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:521 in `fetch_resource'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:507 in `each_resource'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:402 in `each_address'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:116 in `block in each_address'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:115 in `each'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:115 in `each_address'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:102 in `getaddresses'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/3.2.0/resolv.rb:51 in `getaddresses'
               |     (irb):23 in `block in <top (required)>'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:158 in `block in run'
               |     /Users/olivierlacan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:310 in `block in schedule'
=> #<Async::Task:0x000000000003c398>

Now here's the kicker, I can't reproduce this on Ruby 3.0:

irb(main):007:0> RUBY_VERSION
=> "3.0.2"
irb(main):008:0> Async::VERSION
=> "1.31.0"
irb(main):009:0> ::Resolv.getaddresses("google.com")
=> ["142.250.217.174", "2607:f8b0:4008:815::200e"]
irb(main):010:0> Async { ::Resolv.getaddresses("google.com") }
=>
#<Async::Task:0x00000001079a1ea8
 @annotation=nil,
 @children=nil,
 @fiber=nil,
 @finished=nil,
 @head=nil,
 @logger=nil,
 @object_name=nil,
 @parent=nil,
 @reactor=
  #<Async::Reactor:0x00000001079a2510
   @annotation=nil,
   @blocked=0,
   @children=#<Async::Children:0x00000001079a1e58 @head=nil, @size=0, @tail=nil, @transient_count=0>,
   @guard=#<Thread::Mutex:0x00000001079a21c8>,
   @head=nil,
   @interrupted=false,
   @logger=nil,
   @object_name=nil,
   @parent=nil,
   @ready=[],
   @running=[],
   @scheduler=#<Async::Scheduler:0x00000001079a2218 @reactor=#<Async::Reactor:0x00000001079a2510 ...>>,
   @selector=nil,
   @tail=nil,
   @timers=
    #<Timers::Group:0x00000001079a23f8
     @events=
      #<Timers::Events:0x00000001079a23d0
       @queue=[],
       @sequence=#<Timers::PriorityHeap:0x00000001079a23a8 @contents=[]>>,
     @interval=#<Timers::Interval:0x00000001079a2290 @current=1004617.430651, @total=0.0>,
     @paused_timers=#<Set: {}>,
     @timers=#<Set: {}>>,
   @transient=false,
   @unblocked=[]>,
 @result=["142.250.217.174", "2607:f8b0:4008:815::200e"],
 @status=:complete,
 @tail=nil,
 @transient=false>
ioquatix commented 1 year ago

Ruby 3.1 / Async 2+ uses the new address_resolve fiber scheduler hook. I think that could be the root cause of the issue.

https://github.com/socketry/async/blob/main/lib/async/scheduler.rb#L152-L154

I wonder if our wrappers around getaddrinfo are not working correctly with interface suffix.

cc @bruno-

olivierlacan commented 1 year ago

Scheduler was definitely part of the backtrace when I was started looking into this.

ioquatix commented 1 year ago

I believe this issue should be fixed by https://github.com/socketry/async/pull/243. It would be great if people who can reproduce it, can test it. I'll cut a new release of Async 2.x soon.

trevorturk commented 1 year ago

I'm not on AT&T anymore, so I haven't been able to reproduce for a while, but perhaps @olivierlacan can give it a shot?

trevorturk commented 2 months ago

I'm going to close this since I believe it should be fixed and I haven't seen it for a longggg time.

ioquatix commented 2 months ago

Thanks everyone!