Closed mr-intj closed 5 years ago
Could you please provide a source IP address, the network and perhaps a traceroute6?
so I'm guessing it's not limited to HE.net.
That we don’t know as you are the first one to actually provide at least a bit information about your client network.
The target IPv6 address has recently changed from Vultr to KeyCDN, so if you are having connectivity problems for month those are two distinct problems (and Vultr was problematic, the RA was dropping default gateways from time to time...).
Also what HE POP is your tunnel using?
Since you said the connection over IPv6 works from time to time, could you try pulling HTTP Headers? There should be something like X-Edge-Location header that I will need when escalating the problem to the CDN provider.
Could you please provide a source IP address, the network and perhaps a traceroute6?
$ traceroute6 packages.sury.org traceroute to packages.sury.org (2a0b:4d07:2::2), 30 hops max, 80 byte packets 1 firewall.scott-smith.us (2001:470:8132:123::1) 0.873 ms 0.849 ms 0.824 ms 2 tunnel332727.tunnel.tserv3.fmt2.ipv6.he.net (2001:470:1f04:5a4::1) 27.715 ms 33.082 ms 37.830 ms 3 10ge11-19.core4.fmt2.he.net (2001:470:0:45::1) 31.289 ms 38.743 ms 38.825 ms 4 100ge9-1.core1.pao1.he.net (2001:470:0:263::2) 35.686 ms 100ge14-1.core1.sjc2.he.net (2001:470:0:3d3::1) 38.887 ms 100ge9-1.core1.pao1.he.net (2001:470:0:263::2) 38.968 ms 5 xe-0.equinix.snjsca04.us.bb.gin.ntt.net (2001:504:0:1::2914:1) 39.069 ms 39.058 ms xe-0.paix.plalca01.us.bb.gin.ntt.net (2001:504:d::6) 39.613 ms 6 xe-0-0-0-3-0.r05.plalca01.us.bb.gin.ntt.net (2001:418:0:5000::af4) 39.672 ms !X 38.143 ms !X 35.156 ms !X
Also what HE POP is your tunnel using?
Looking online, it seems Hurricane Electric has 11 PoPs here in the US (where I am), but I'm not sure how to narrow it down beyond that.
Since you said the connection over IPv6 works from time to time, could you try pulling HTTP Headers? There should be something like X-Edge-Location header that I will need when escalating the problem to the CDN provider.
Here's an example where it hangs (after five minutes, I gave up waiting for it to time out)
$ wget -S http://packages.sury.org
--2019-03-23 15:46:47-- http://packages.sury.org/
Resolving packages.sury.org (packages.sury.org)... 2a0b:4d07:2::3, 2a0b:4d07:2::2, 2a0b:4d07:2::1, ...
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::3|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 301 Moved Permanently
Server: keycdn-engine
Date: Sat, 23 Mar 2019 22:46:47 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
Location: https://packages.sury.org/
Expires: Sun, 24 Mar 2019 22:46:47 GMT
Cache-Control: max-age=86400
X-Edge-Location: ussf
Access-Control-Allow-Origin: *
Location: https://packages.sury.org/ [following]
--2019-03-23 15:46:47-- https://packages.sury.org/
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::3|:443... connected.
(hang here)
As tends to happen when debugging, today I seem to be getting through more often than not. Typically I see failures any time I attempt to access via IPv6.
Here's an example of a server response when I do get through successfully:
$ wget -S http://packages.sury.org
--2019-03-23 15:54:00-- http://packages.sury.org/
Resolving packages.sury.org (packages.sury.org)... 2a0b:4d07:2::1, 2a0b:4d07:2::2, 2a0b:4d07:2::3, ...
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::1|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 301 Moved Permanently
Server: keycdn-engine
Date: Sat, 23 Mar 2019 22:54:00 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
Location: https://packages.sury.org/
Expires: Sun, 24 Mar 2019 22:54:00 GMT
Cache-Control: max-age=86400
X-Edge-Location: ussf
Access-Control-Allow-Origin: *
Location: https://packages.sury.org/ [following]
--2019-03-23 15:54:00-- https://packages.sury.org/
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::1|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: keycdn-engine
Date: Sat, 23 Mar 2019 22:54:00 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=86400
Expires: Sun, 24 Mar 2019 22:54:00 GMT
X-Edge-Location: ussf
Access-Control-Allow-Origin: *
Since things seem to be behaving well today, I've only been able to reproduce the problem once. I don't know whether the redirect to https is always where the problem shows up, or whether that's just what happened this time.
EDIT:
OK, here's another case, this time without the redirect to 443:
$ wget -S https://packages.sury.org
--2019-03-23 16:41:24-- https://packages.sury.org/
Resolving packages.sury.org (packages.sury.org)... 2a0b:4d07:2::2, 2a0b:4d07:2::1, 2a0b:4d07:2::3, ...
Connecting to packages.sury.org (packages.sury.org)|2a0b:4d07:2::2|:443... connected.
(hang)
Also, I've let this sit for over an hour without canceling, and it doesn't time-out.
Thanks for the provided info. What’s your MTU on the tunnel interface and could you try lowering it a little bit? It’s a wild shot, but it looks like (headers received but no data) PMTUD might not be working for you and the tunnel is dropping large packets.
Wireshark/tcpdump might tell us more, but I will try reporting this to KeyCDN and see what happens.
For a primer on IPv6 PMTUD, see here: http://test-ipv6.com/faq_pmtud.html
There’s also a very good test by Berkeley University here: http://netalyzr.icsi.berkeley.edu
Thanks for the provided info. What’s your MTU on the tunnel interface and could you try lowering it a little bit? It’s a wild shot, but it looks like (headers received but no data) PMTUD might not be working for you and the tunnel is dropping large packets.
It's set to the default ("usually 1500 bytes, but can vary in some circumstances"). I dropped it to 1280, but was still able to repeat the failure with no obvious differences in behavior.
Wireshark/tcpdump might tell us more, but I will try reporting this to KeyCDN and see what happens.
Here's a tcpdump. Note that MTU had already been returned to the default setting before I captured this.
For a primer on IPv6 PMTUD, see here: http://test-ipv6.com/faq_pmtud.html
FWIW, ipv6-test does test for PMTUD issues and doesn't find any on my connection, so I don't think the issue is with our firewalls or tunnels (unless the ipv6-test test is broken). You might want to add packages.sury.org to his list of other IPv6 sites - he lists an email address at the bottom of the page.
There’s also a very good test by Berkeley University here: http://netalyzr.icsi.berkeley.edu
Sadly, it seems to be no more:
Please note: after nearly a decade of providing this service we have decided to shut down Netalyzr in the first week of March 2019. It pains us greatly to do so, but each team member has at this point moved on to other responsibilities. Our Android app users will receive a final update during this time. Thanks to everyone who has used the service over the years — you all helped make the project a tremendous success.
Hi, so the CDN provider needs:
The source IPv6 and an "mtr -rnc100 packages.sury.org" while the issue happens would be good to analyze this further.
... the CDN provider needs:
The source IPv6 ...
2001:470:8132:123::1
...and an "mtr -rnc100 packages.sury.org" while the issue happens would be good to analyze this further.
$ mtr -rnc100 packages.sury.org
Start: Sat Mar 30 10:49:41 2019
HOST: z800-desktop Loss% Snt Last Avg Best Wrst StDev
1.|-- 2001:470:8132:123::1 0.0% 100 0.3 0.3 0.2 0.4 0.0
2.|-- 2001:470:1f04:5a4::1 0.0% 100 29.3 33.5 26.6 184.0 19.7
3.|-- 2001:470:0:45::1 0.0% 100 24.4 32.7 22.6 155.4 18.2
4.|-- 2001:470:0:3d3::1 0.0% 100 24.5 31.2 23.5 162.0 19.9
5.|-- 2001:504:0:1::2914:1 0.0% 100 25.5 31.8 23.6 128.7 18.6
6.|-- 2001:418:0:2000::1a1 0.0% 100 30.4 36.9 24.1 180.9 30.0
7.|-- ??? 100.0 100 0.0 0.0 0.0 0.0 0.0
8.|-- 2a0b:4d07:2::3 0.0% 100 25.2 32.3 24.6 154.7 19.8
Hey, the KeyCDN folks don’t detect any problems, so they might be intermittent. Do you still see the problems over IPv6?
$ curl -6 https://packages.sury.org/
(HANG)^C
...but...
$ ping -6 packages.sury.org
PING packages.sury.org(2a0b:4d07:2::2 (2a0b:4d07:2::2)) 56 data bytes
64 bytes from 2a0b:4d07:2::2 (2a0b:4d07:2::2): icmp_seq=1 ttl=58 time=26.3 ms
. . .
64 bytes from 2a0b:4d07:2::2 (2a0b:4d07:2::2): icmp_seq=11 ttl=58 time=27.9 ms
^C
--- packages.sury.org ping statistics ---
11 packets transmitted, 11 received, 0% packet loss, time 10014ms
rtt min/avg/max/mdev = 24.608/26.671/29.299/1.319 ms
...but I'm still getting timeouts from apt update
for packages.sury.org/php stretch InRelease...
EDIT: "...timeouts from apt update when it's using IPv6...", that is
Hey Ondřej,
I just checked this today, and I'm able to get through over IPv6 with no issues. Not sure whether that's due to alignment of the planets, or you and the KeyCDN folks got this resolved.
I added CloudFlare on top of CDN, so that might be the reason.
KeyCDN never found anything wrong on their side, and it would take two network engineers (from KeyCDN and from affected network) to look at the issue together, and I am guessing that’s not going to happen.
Anyway, glad that this helped you.
I've been getting errors on packages.sury.org during
apt update
, e.g.:The failure rate is less than 100%, but it's probably above 90%. When the problem didn't go away after a few months, I looked into it and happened to notice that it only occurs over IPv6; when I force IPv4 (
apt-get -o Acquire::ForceIPv4=true install ...
), there are no problems.My IPv6 tunnel is through HE.net (I've not had IPv6 issues with any other sites over the last 5+ years). I have a remotely hosted box with (native) IPv6 support, and if I ssh into that, I can get through via IPv6 to packages.sury.org with no issues.
So this seems to be a routing issue of some kind. I searched through issues and noticed that some other people had reported IPv6 problems, so I'm guessing it's not limited to HE.net.