travis-ci / travis-ci

Free continuous integration platform for GitHub projects.
https://travis-ci.org
8.41k stars 723 forks source link

Can't download file from FTP in docker container #3692

Closed ElDeveloper closed 9 years ago

ElDeveloper commented 9 years ago

Our Travis build is using a container based configuration, and as of yesterday we started seeing that downloads from an FTP site (ftp://ftp.microbio.me) would fail with the following error (build here):

$ wget ftp://ftp.microbio.me/pub/QIIME-v1.9.0-dependencies/suma_package_V_1.0.00.tar.gz
--2015-04-17 19:25:19--  ftp://ftp.microbio.me/pub/QIIME-v1.9.0-dependencies/suma_package_V_1.0.00.tar.gz
           => `suma_package_V_1.0.00.tar.gz'
Resolving ftp.microbio.me (ftp.microbio.me)... 169.228.46.98
Connecting to ftp.microbio.me (ftp.microbio.me)|169.228.46.98|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/QIIME-v1.9.0-dependencies ... done.
==> SIZE suma_package_V_1.0.00.tar.gz ... 903474
==> PASV ... done.    ==> RETR suma_package_V_1.0.00.tar.gz ... done.
Length: 903474 (882K) (unauthoritative)

 0% [                                       ] 0           --.-K/s              

Testing this, it doesn't seem like it's a problem specific to our FTP server (microbio.me), we also tried wgeting a README from kernel.org and the connection seemed to time out as well (build here):

$ wget ftp://mirrors.kernel.org/gnu/flex/flex.README
--2015-04-17 19:02:01--  ftp://mirrors.kernel.org/gnu/flex/flex.README
           => `flex.README'
Resolving mirrors.kernel.org (mirrors.kernel.org)... 149.20.37.36, 198.145.20.143, 2620:3:c000:a:0:1994:3:14, ...
Connecting to mirrors.kernel.org (mirrors.kernel.org)|149.20.37.36|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /gnu/flex ... done.
==> SIZE flex.README ... 237
==> PASV ... couldn't connect to 10.0.7.35 port 30662: Connection timed out
Retrying.
--2015-04-17 19:04:10--  ftp://mirrors.kernel.org/gnu/flex/flex.README
  (try: 2) => `flex.README'
Connecting to mirrors.kernel.org (mirrors.kernel.org)|149.20.37.36|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /gnu/flex ... done.
==> SIZE flex.README ... 237
==> PASV ... couldn't connect to 10.0.7.35 port 30844: Connection timed out
Retrying.
--2015-04-17 19:06:21--  ftp://mirrors.kernel.org/gnu/flex/flex.README
  (try: 3) => `flex.README'
Connecting to mirrors.kernel.org (mirrors.kernel.org)|149.20.37.36|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /gnu/flex ... done.
==> SIZE flex.README ... 237
==> PASV ... couldn't connect to 10.0.7.35 port 30088: Connection timed out
Retrying.
--2015-04-17 19:08:32--  ftp://mirrors.kernel.org/gnu/flex/flex.README
  (try: 4) => `flex.README'
Connecting to mirrors.kernel.org (mirrors.kernel.org)|149.20.37.36|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /gnu/flex ... done.
==> SIZE flex.README ... 237
==> PASV ... couldn't connect to 10.0.7.35 port 30900: Connection timed out
Retrying.
--2015-04-17 19:10:44--  ftp://mirrors.kernel.org/gnu/flex/flex.README
  (try: 5) => `flex.README'
Connecting to mirrors.kernel.org (mirrors.kernel.org)|149.20.37.36|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /gnu/flex ... done.
==> SIZE flex.README ... 237
==> PASV ... 

When we switch to a non-docker-based build, we don't see the problem. The first time we began seeing failures was yesterday.

Finally, is there a way to get the docker image being used for travis builds?

cc @josenavas, @antgonza, @wasade, @squirrelo, @adamrp

BanzaiMan commented 9 years ago

We have received a few reports of poor network performance from the containers. This is basically a variation of #3660.

ElDeveloper commented 9 years ago

@BanzaiMan Thanks for the info, note that regardless of what we try, we haven't been able to download files from these FTP sites, we are unsure of what the problem might be.

BanzaiMan commented 9 years ago

@ElDeveloper If the same connection attempt works on the standard infrastructure (with sudo: required), then it's the same issue.

wasade commented 9 years ago

Some more detail, for what it's worth, we've monitored inbound/outbound traffic from ftp.microbio.me and can see the pasv connection, as well as data being sent from the server. We've observed this issue on one other system, which uses round-robin DNS, however, we do not have administrative privs on that system so its configuration is a blackbox. My suspicion is port forwarding is somehow confused as that is the biggest overlap I can think of between the system with round-robin DNS and the Docker image.

We have had partial success recreating the issues on newly minted systems. If you block outbound traffic from ports in the pasv range on client side, we can simulate the kernel.org issue above. e.g., -A OUTPUT -d 169.228.46.98 -p tcp --dport 40000:60000 -j DROP, where 40000:60000 is the pasv range of ftp.microbio.me. (169.228.46.98 is ftp.microbio.me). We have not been able to recreate the same behavior we observe when downloading from ftp.microbio.me, where the pasv connection is established but data doesn't transfer.

BanzaiMan commented 9 years ago

@wasade Thanks for the detailed analysis. <3 We will take this into consideration for our troubleshooting.

wasade commented 9 years ago

Hope it helps. We run ftp.microbio.me and I'd be glad to discuss config, test, and poke/prod as needed.

sarahhodne commented 9 years ago

I think I found the issue after testing a bit on another EC2 instance in the same network: The default MTU on the c3 instances that we use is 9001, and it seems if I decrease it to 1500 then the download works fine. I also see a lot of other networking issues we’ve seen disappear, so we’re going to work on getting a fix out for this soon. Thank you for raising this and for providing steps to reproduce!

wasade commented 9 years ago

That's great!! I couldn't wrap my head around the MTU setting resolving this, other than it further pointing to a layer 3 issue, so I poked the google a bit and came across this. Looks like there is a potential issue negotiating MTU sizes with masquerading specifically with FTPs (I suspect due to the pasv negotiation?). It refers to older kernels, but the issues sound very similar to at least the download issue with ftp.microbio.me.

wasade commented 9 years ago

As a further update on this, ftp.microbio.me had a MTU of 9000 on its external interface. This came about when debugging a separate networking issue, and is what led to this issue. We've since dropped it back down to 1500. Just wanted to comment here about what drove this to happen server-side in case this confusing ftp issue impacts anyone else.

sarahhodne commented 9 years ago

We’ve reduced the MTU to 1500 now, are you still seeing this issue?

ElDeveloper commented 9 years ago

Yes, it seems to be working, thanks!