Closed fredr closed 11 months ago
cc @rust-lang/infra
No, this is just the main website. You can ask on t-infra on rust-lang.zulipchat.org
:+1: thanks, I'll open a topic there
Hi @fredr, thanks for reporting this! I moved the issue to the infra-team's repo and added it to our project board. Will try to reproduce this later today or tomorrow.
Hi @fredr,
Sorry for the long delay on this. I just tried reproducing the issue by running the command that you shared, but I don't get the connection resets from my network. 😬 Before diving deeper, do you still experience the issue or has it resolved itself over the past few weeks?
No worries, thanks for looking into it. I just ran the test, and got the reset after a few requests:
❯ while true; do curl -6 https://static.rust-lang.org/dist/2023-04-15/channel-rust-nightly.toml.sha256 --resolve 'static.rust-lang.org:443:2a04:4e42:200::649'; sleep 1; done
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
curl: (35) Recv failure: Connection reset by peer
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
31b22ae424824f90e33b888b34d111849e583ed879d21cc113db54b42fd4477a channel-rust-nightly.toml
When you run it, did you run it with this exact resolve?
--resolve 'static.rust-lang.org:443:2a04:4e42:200::649'
Might be that the dns where you are resolves to a different addresses for static.rust-lang.org:
?
I get different lookup depending on where I am, but from a node in one of our datacenters I get these two:
$ dig aaaa static.rust-lang.org +short
fastly-static.rust-lang.org.
dualstack.k.sni.global.fastly.net.
2a04:4e42::649
2a04:4e42:200::649
2a04:4e42:400::649
2a04:4e42:600::649
$ dig aaaa static.rust-lang.org +short
cloudfront-static.rust-lang.org.
d3ah34wvbudrdd.cloudfront.net.
2600:9000:2334:f400:5:26a9:7440:93a1
2600:9000:2334:9e00:5:26a9:7440:93a1
2600:9000:2334:1c00:5:26a9:7440:93a1
2600:9000:2334:b000:5:26a9:7440:93a1
2600:9000:2334:e00:5:26a9:7440:93a1
2600:9000:2334:1400:5:26a9:7440:93a1
2600:9000:2334:1200:5:26a9:7440:93a1
2600:9000:2334:5000:5:26a9:7440:93a1
And I get the problem with all of those fastly ip addresses, but I havent been able to reproduce it with the cloudfront addresses.
I copy & pasted your command to make sure it's the same. And the DNS records resolve to the some addresses for me as well. 😕
Do your build machines run in the same network as your computer? Or do they share the same internet service provider?
TIL that we are our own ISP :open_mouth:, and we use the same from the office and the data center.
We did a bit of testing on our end from an other ISP, and with that we don't seem to have the problem.
The difference between the two are that ours route traffic via Arelion, and the other via NORDUnet. Feels like some kind of routing problem somewhere, potentially, hard to debug.
Maybe worth opening a issue with fastly, if you guys are fastly customers? otherwise, not the end of the world, we have added lots of retries, and will see if there is anything we can change in how we route traffic to fastly.
Oh that is a very interesting TIL! 😮 Makes me miss my days working in networking...
I'll forward this issue to our contacts at Fastly to see if they have any idea on how to further debug this. Sadly, though, I expect that there's little that we can do from our side to help out here. But let me confirm this first...
Good to hear that you found a workaround for now, though. 👍
@fredr TCP sessions receiving resets when connecting to anycast addresses is usually a result of unstable ECMP.
Please double check for any ECMP load-balancing in your network and ensure it's configured as "per flow" with only source and destination IPs and ports used in the hash function.
If you continue to experience issues please contact support@fastly.com with the details and a link to this issue.
We have double checked our ECMP, and it was configured correctly (we also didn't have problems with other CDNs).
But, for whatever reason, we can no longer reproduce the problem, so hopefully it magically solved itself. I'll send an email to that address if it resurfaces.
Thank you both for looking into this! Much appreciated
Page(s) Affected
Most likely the same on all static.rust-lang.org, but this is the page I've been testing with: https://static.rust-lang.org/dist/2023-04-15/channel-rust-nightly.toml.sha256
What needs to be fixed?
We've noticed in the last couple of weeks that our CI pipelines started failing with "Connection reset by peer" when trying to install the nightly toolchain, specifically we've seen it when downloading these sha hashes.
This only happens when using IPV6.
I have noticed that
static.rust-lang.org
sometimes resolve to fastly and sometimes resolve to cloud front. From my testing this seem to only happens when it is resolved to fastly.So my guess is the combination of fastly+ipv6 causes these errors.
The way I've been able to reproduce this issue, both from our build machines, but also from my computer, is to run a command like this:
The -6 assures that ipv6 is used, and the --resolve assures that a specific fastly ip is used (but the problem have been noted on all the different fastly ips).
Depending on what network I'm on, the dns doesn't always resolve static.rust-lang.org to fastly, not sure if that is dependent on location or something else, but the above curl will resolve to a fastly ip directly.
The output from a failed curl with -vv added:
PS. not sure if this is the correct place to report this?