nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

endpoints/cli: Support downloading standalone installation archives from PR builds #645

Closed tsibley closed 1 year ago

tsibley commented 1 year ago

This new endpoint allows the standalone installer to install not just released versions but also the CI builds produced for arbitrary PRs. That's very helpful for development and testing of PRs. With this new endpoint, for example, we can run:

curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/linux \
    | DESTINATION=/tmp/cli bash -s pr-build/243

to install /tmp/cli/nextstrain from the last successful CI build for:

https://github.com/nextstrain/cli/pulls/243

This uses existing support for downloading archives from CI builds. The PR id is translated into the latest successful CI run id for that PR.

Related issue(s)

643

Testing

tsibley commented 1 year ago

This is deployed to production. It works, but is very slow because it suffers the same issues as the CI build downloads in #643.

$ curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/linux     | DESTINATION=/tmp/cli-test bash -s pr-build/248
--> Temporary working directory: /tmp/tmp.kFPyt0rCkU
--> Downloading https://nextstrain.org/cli/download/pr-build/248/standalone-x86_64-unknown-linux-gnu.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 48.7M    0 48.7M    0     0   160k      0 --:--:--  0:05:10 --:--:--  176k
--> Extracting standalone-x86_64-unknown-linux-gnu.tar.gz

5 minutes to download 48.7 MB! Only 160 kB/s average speed. Oof.

This seems like it should be faster. It's certainly much faster locally, so I wonder if the bottleneck on Heroku is mostly network (over the internet from Heroku → me vs. the local loopback interface) or CPU (Heroku dyno's vs. my laptop, for decompression). Certainly there will be some increased time due to not going over the loopback interface, but I wouldn't expect this much, so my guess is it's mostly CPU.

tsibley commented 1 year ago

In somewhat of a surprise to me, the bottleneck is nowhere in our stack. It's not our nextstrain.org code or Heroku's dynos or Heroku's routing proxy or our personal devices/software. It's the Hutch network. The behaviour is reproducible across a variety of Linux and macOS devices, across different curl versions, across personal vs. institutional hardware (e.g. rhino). No other network I've tested is slow to download this. If I route traffic from, say, my home network, thru the Hutch's network, it suffers the same extreme slowdown for this download. Wild.

Preliminary analysis seems to point to TCP congestion/lost packets, but I'm not entirely sure. It's so darn slow. On a personal server with a fast connection, I can average 4172 kiB/s from this nextstrain.org download.

This also seems to be the same underlying issue with slow downloads from Fauna/RethinkDB that others have noted on the Hutch network vs. other networks. Notably, both nextstrain.org (via Heroku's US region) and our Fauna/RethinkDB server are hosted in EC2 us-east-1. Coincidence?

I'll dig more, but likely report back elsewhere. Wanted to follow up on my early speculation though.

Yeesh.