oerdnj / deb.sury.org

Public bugreports for anything ppa:ondrej/*
818 stars 27 forks source link

IPv6 access to #1133

Closed mr-intj closed 5 years ago

mr-intj commented 5 years ago

I've been getting errors on packages.sury.org during apt update, e.g.:

Err:7 https://packages.sury.org/php stretch InRelease                                                                                                                       
  Operation timed out after 120001 milliseconds with 0 out of 0 bytes received

The failure rate is less than 100%, but it's probably above 90%. When the problem didn't go away after a few months, I looked into it and happened to notice that it only occurs over IPv6; when I force IPv4 (apt-get -o Acquire::ForceIPv4=true install ...), there are no problems.

My IPv6 tunnel is through HE.net (I've not had IPv6 issues with any other sites over the last 5+ years). I have a remotely hosted box with (native) IPv6 support, and if I ssh into that, I can get through via IPv6 to packages.sury.org with no issues.

So this seems to be a routing issue of some kind. I searched through issues and noticed that some other people had reported IPv6 problems, so I'm guessing it's not limited to HE.net.

oerdnj commented 5 years ago

Could you please provide a source IP address, the network and perhaps a traceroute6?

oerdnj commented 5 years ago

so I'm guessing it's not limited to HE.net.

That we don’t know as you are the first one to actually provide at least a bit information about your client network.

The target IPv6 address has recently changed from Vultr to KeyCDN, so if you are having connectivity problems for month those are two distinct problems (and Vultr was problematic, the RA was dropping default gateways from time to time...).

oerdnj commented 5 years ago

Also what HE POP is your tunnel using?

oerdnj commented 5 years ago

Since you said the connection over IPv6 works from time to time, could you try pulling HTTP Headers? There should be something like X-Edge-Location header that I will need when escalating the problem to the CDN provider.

mr-intj commented 5 years ago

Could you please provide a source IP address, the network and perhaps a traceroute6?

$ traceroute6 packages.sury.org
traceroute to packages.sury.org (2a0b:4d07:2::2), 30 hops max, 80 byte packets
1  firewall.scott-smith.us (2001:470:8132:123::1)  0.873 ms  0.849 ms  0.824 ms
2  tunnel332727.tunnel.tserv3.fmt2.ipv6.he.net (2001:470:1f04:5a4::1)  27.715 ms  33.082 ms  37.830 ms
3  10ge11-19.core4.fmt2.he.net (2001:470:0:45::1)  31.289 ms  38.743 ms  38.825 ms
4  100ge9-1.core1.pao1.he.net (2001:470:0:263::2)  35.686 ms 100ge14-1.core1.sjc2.he.net (2001:470:0:3d3::1)  38.887 ms 100ge9-1.core1.pao1.he.net (2001:470:0:263::2)  38.968 ms
5  xe-0.equinix.snjsca04.us.bb.gin.ntt.net (2001:504:0:1::2914:1)  39.069 ms  39.058 ms xe-0.paix.plalca01.us.bb.gin.ntt.net (2001:504:d::6)  39.613 ms
6  xe-0-0-0-3-0.r05.plalca01.us.bb.gin.ntt.net (2001:418:0:5000::af4)  39.672 ms !X  38.143 ms !X  35.156 ms !X
mr-intj commented 5 years ago

Also what HE POP is your tunnel using?

Looking online, it seems Hurricane Electric has 11 PoPs here in the US (where I am), but I'm not sure how to narrow it down beyond that.

mr-intj commented 5 years ago

Since you said the connection over IPv6 works from time to time, could you try pulling HTTP Headers? There should be something like X-Edge-Location header that I will need when escalating the problem to the CDN provider.

Here's an example where it hangs (after five minutes, I gave up waiting for it to time out)

$ wget -S http://packages.sury.org 
--2019-03-23 15:46:47--  http://packages.sury.org/
Resolving packages.sury.org (packages.sury.org)... 2a0b:4d07:2::3, 2a0b:4d07:2::2, 2a0b:4d07:2::1, ...
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::3|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 301 Moved Permanently
  Server: keycdn-engine
  Date: Sat, 23 Mar 2019 22:46:47 GMT
  Content-Type: text/html
  Content-Length: 162
  Connection: keep-alive
  Location: https://packages.sury.org/
  Expires: Sun, 24 Mar 2019 22:46:47 GMT
  Cache-Control: max-age=86400
  X-Edge-Location: ussf
  Access-Control-Allow-Origin: *
Location: https://packages.sury.org/ [following]
--2019-03-23 15:46:47--  https://packages.sury.org/
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::3|:443... connected.
(hang here)

As tends to happen when debugging, today I seem to be getting through more often than not. Typically I see failures any time I attempt to access via IPv6.

Here's an example of a server response when I do get through successfully:

$ wget -S http://packages.sury.org 
--2019-03-23 15:54:00--  http://packages.sury.org/
Resolving packages.sury.org (packages.sury.org)... 2a0b:4d07:2::1, 2a0b:4d07:2::2, 2a0b:4d07:2::3, ...
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::1|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 301 Moved Permanently
  Server: keycdn-engine
  Date: Sat, 23 Mar 2019 22:54:00 GMT
  Content-Type: text/html
  Content-Length: 162
  Connection: keep-alive
  Location: https://packages.sury.org/
  Expires: Sun, 24 Mar 2019 22:54:00 GMT
  Cache-Control: max-age=86400
  X-Edge-Location: ussf
  Access-Control-Allow-Origin: *
Location: https://packages.sury.org/ [following]
--2019-03-23 15:54:00--  https://packages.sury.org/
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::1|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: keycdn-engine
  Date: Sat, 23 Mar 2019 22:54:00 GMT
  Content-Type: text/html; charset=utf-8
  Transfer-Encoding: chunked
  Connection: keep-alive
  Vary: Accept-Encoding
  Cache-Control: max-age=86400
  Expires: Sun, 24 Mar 2019 22:54:00 GMT
  X-Edge-Location: ussf
  Access-Control-Allow-Origin: *

Since things seem to be behaving well today, I've only been able to reproduce the problem once. I don't know whether the redirect to https is always where the problem shows up, or whether that's just what happened this time.

EDIT:

OK, here's another case, this time without the redirect to 443:

$ wget -S https://packages.sury.org
--2019-03-23 16:41:24--  https://packages.sury.org/
Resolving packages.sury.org (packages.sury.org)... 2a0b:4d07:2::2, 2a0b:4d07:2::1, 2a0b:4d07:2::3, ...
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::2|:443... connected.
(hang)

Also, I've let this sit for over an hour without canceling, and it doesn't time-out.

oerdnj commented 5 years ago

Thanks for the provided info. What’s your MTU on the tunnel interface and could you try lowering it a little bit? It’s a wild shot, but it looks like (headers received but no data) PMTUD might not be working for you and the tunnel is dropping large packets.

Wireshark/tcpdump might tell us more, but I will try reporting this to KeyCDN and see what happens.

oerdnj commented 5 years ago

For a primer on IPv6 PMTUD, see here: http://test-ipv6.com/faq_pmtud.html

There’s also a very good test by Berkeley University here: http://netalyzr.icsi.berkeley.edu

mr-intj commented 5 years ago

Thanks for the provided info. What’s your MTU on the tunnel interface and could you try lowering it a little bit? It’s a wild shot, but it looks like (headers received but no data) PMTUD might not be working for you and the tunnel is dropping large packets.

It's set to the default ("usually 1500 bytes, but can vary in some circumstances"). I dropped it to 1280, but was still able to repeat the failure with no obvious differences in behavior.

Wireshark/tcpdump might tell us more, but I will try reporting this to KeyCDN and see what happens.

Here's a tcpdump. Note that MTU had already been returned to the default setting before I captured this.

mr-intj commented 5 years ago

For a primer on IPv6 PMTUD, see here: http://test-ipv6.com/faq_pmtud.html

FWIW, ipv6-test does test for PMTUD issues and doesn't find any on my connection, so I don't think the issue is with our firewalls or tunnels (unless the ipv6-test test is broken). You might want to add packages.sury.org to his list of other IPv6 sites - he lists an email address at the bottom of the page.

There’s also a very good test by Berkeley University here: http://netalyzr.icsi.berkeley.edu

Sadly, it seems to be no more:

Please note: after nearly a decade of providing this service we have decided to shut down Netalyzr in the first week of March 2019. It pains us greatly to do so, but each team member has at this point moved on to other responsibilities. Our Android app users will receive a final update during this time. Thanks to everyone who has used the service over the years — you all helped make the project a tremendous success.

oerdnj commented 5 years ago

Hi, so the CDN provider needs:

The source IPv6 and an "mtr -rnc100 packages.sury.org" while the issue happens would be good to analyze this further.

mr-intj commented 5 years ago

... the CDN provider needs:

The source IPv6 ...

2001:470:8132:123::1

...and an "mtr -rnc100 packages.sury.org" while the issue happens would be good to analyze this further.

$ mtr -rnc100 packages.sury.org
Start: Sat Mar 30 10:49:41 2019
HOST: z800-desktop                Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:470:8132:123::1       0.0%   100    0.3   0.3   0.2   0.4   0.0
  2.|-- 2001:470:1f04:5a4::1       0.0%   100   29.3  33.5  26.6 184.0  19.7
  3.|-- 2001:470:0:45::1           0.0%   100   24.4  32.7  22.6 155.4  18.2
  4.|-- 2001:470:0:3d3::1          0.0%   100   24.5  31.2  23.5 162.0  19.9
  5.|-- 2001:504:0:1::2914:1       0.0%   100   25.5  31.8  23.6 128.7  18.6
  6.|-- 2001:418:0:2000::1a1       0.0%   100   30.4  36.9  24.1 180.9  30.0
  7.|-- ???                       100.0   100    0.0   0.0   0.0   0.0   0.0
  8.|-- 2a0b:4d07:2::3             0.0%   100   25.2  32.3  24.6 154.7  19.8
oerdnj commented 5 years ago

Hey, the KeyCDN folks don’t detect any problems, so they might be intermittent. Do you still see the problems over IPv6?

mr-intj commented 5 years ago
$ curl -6 https://packages.sury.org/
(HANG)^C

...but...

$ ping -6 packages.sury.org
PING packages.sury.org(2a0b:4d07:2::2 (2a0b:4d07:2::2)) 56 data bytes
64 bytes from 2a0b:4d07:2::2 (2a0b:4d07:2::2): icmp_seq=1 ttl=58 time=26.3 ms
    . . .
64 bytes from 2a0b:4d07:2::2 (2a0b:4d07:2::2): icmp_seq=11 ttl=58 time=27.9 ms
^C
--- packages.sury.org ping statistics ---
11 packets transmitted, 11 received, 0% packet loss, time 10014ms
rtt min/avg/max/mdev = 24.608/26.671/29.299/1.319 ms

...but I'm still getting timeouts from apt update for packages.sury.org/php stretch InRelease...

EDIT: "...timeouts from apt update when it's using IPv6...", that is

mr-intj commented 5 years ago

Hey Ondřej,

I just checked this today, and I'm able to get through over IPv6 with no issues. Not sure whether that's due to alignment of the planets, or you and the KeyCDN folks got this resolved.

oerdnj commented 5 years ago

I added CloudFlare on top of CDN, so that might be the reason.

KeyCDN never found anything wrong on their side, and it would take two network engineers (from KeyCDN and from affected network) to look at the issue together, and I am guessing that’s not going to happen.

Anyway, glad that this helped you.