Closed ElDeveloper closed 9 years ago
We have received a few reports of poor network performance from the containers. This is basically a variation of #3660.
@BanzaiMan Thanks for the info, note that regardless of what we try, we haven't been able to download files from these FTP sites, we are unsure of what the problem might be.
@ElDeveloper If the same connection attempt works on the standard infrastructure (with sudo: required
), then it's the same issue.
Some more detail, for what it's worth, we've monitored inbound/outbound traffic from ftp.microbio.me
and can see the pasv
connection, as well as data being sent from the server. We've observed this issue on one other system, which uses round-robin DNS, however, we do not have administrative privs on that system so its configuration is a blackbox. My suspicion is port forwarding is somehow confused as that is the biggest overlap I can think of between the system with round-robin DNS and the Docker image.
We have had partial success recreating the issues on newly minted systems. If you block outbound traffic from ports in the pasv range on client side, we can simulate the kernel.org
issue above. e.g., -A OUTPUT -d 169.228.46.98 -p tcp --dport 40000:60000 -j DROP
, where 40000:60000 is the pasv range of ftp.microbio.me
. (169.228.46.98 is ftp.microbio.me
). We have not been able to recreate the same behavior we observe when downloading from ftp.microbio.me
, where the pasv connection is established but data doesn't transfer.
@wasade Thanks for the detailed analysis. <3 We will take this into consideration for our troubleshooting.
Hope it helps. We run ftp.microbio.me
and I'd be glad to discuss config, test, and poke/prod as needed.
I think I found the issue after testing a bit on another EC2 instance in the same network: The default MTU on the c3 instances that we use is 9001, and it seems if I decrease it to 1500 then the download works fine. I also see a lot of other networking issues we’ve seen disappear, so we’re going to work on getting a fix out for this soon. Thank you for raising this and for providing steps to reproduce!
That's great!! I couldn't wrap my head around the MTU setting resolving this, other than it further pointing to a layer 3 issue, so I poked the google a bit and came across this. Looks like there is a potential issue negotiating MTU sizes with masquerading specifically with FTPs (I suspect due to the pasv negotiation?). It refers to older kernels, but the issues sound very similar to at least the download issue with ftp.microbio.me
.
As a further update on this, ftp.microbio.me
had a MTU of 9000 on its external interface. This came about when debugging a separate networking issue, and is what led to this issue. We've since dropped it back down to 1500. Just wanted to comment here about what drove this to happen server-side in case this confusing ftp issue impacts anyone else.
We’ve reduced the MTU to 1500 now, are you still seeing this issue?
Yes, it seems to be working, thanks!
Our Travis build is using a container based configuration, and as of yesterday we started seeing that downloads from an FTP site (ftp://ftp.microbio.me) would fail with the following error (build here):
Testing this, it doesn't seem like it's a problem specific to our FTP server (microbio.me), we also tried wgeting a README from kernel.org and the connection seemed to time out as well (build here):
When we switch to a non-docker-based build, we don't see the problem. The first time we began seeing failures was yesterday.
Finally, is there a way to get the docker image being used for travis builds?
cc @josenavas, @antgonza, @wasade, @squirrelo, @adamrp