nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

Support downloading standalone installation archives from CI builds #643

Closed tsibley closed 1 year ago

tsibley commented 1 year ago

This new endpoint allows the standalone installer to install not just released versions but also the builds produced by arbitrary CI runs. That's very helpful for development and testing of PRs. With this new endpoint, for example, we can run:

curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/linux \
    | DESTINATION=/tmp/cli bash -s ci-build/3859193828

to install /tmp/cli/nextstrain from:

https://github.com/nextstrain/cli/actions/runs/3859193828#artifacts

Artifacts from GitHub Actions workflow runs require a bit more ceremony than release assets, as all artifacts come wrapped in a ZIP file, which we need to unwrap server-side for our installer. Doing this server-side also resolves the issue of artifacts requiring authentication to download (despite that our artifacts are publicly visible). Keeping the additional complexity of API requests, authentication, and additional compression out of the installer itself keeps the installer simpler and thus more robust for end users.

Testing

tsibley commented 1 year ago

With this endpoint in place, we can ~easily extend it to also support pr/X "versions", which would lookup the latest successful CI run for the given PR and download that.

tsibley commented 1 year ago

I've been thinking about adding this functionality for a while, but https://github.com/nextstrain/cli/pull/248 today motivated me to do it.

tsibley commented 1 year ago

I'm going to merge and deploy this before review, as it seems low stakes and is primarily an internal endpoint to aid our development, so the audience is small.

tsibley commented 1 year ago

Deployed to canary. Tested it with:

curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/linux \
  | DESTINATION=/tmp/cli \
    NEXTSTRAIN_DOT_ORG=https://next.nextstrain.org \
    bash -s ci-build/3859193828

and got a 500 when downloading the tarball via next.nextstrain.org because the artifact download got a 403 (even though we're providing authorization):

2023-01-09T19:16:12.473547+00:00 app[web.1]: [verbose]  [fetch] GET https://api.github.com/repos/nextstrain/cli/actions/runs/3859193828/artifacts (cache: undefined)
2023-01-09T19:16:12.575884+00:00 app[web.1]: [verbose]  [fetch] 200 OK https://api.github.com/repos/nextstrain/cli/actions/runs/3859193828/artifacts (cache miss, timestamp 2023-01-09T19:16:12.575Z)
2023-01-09T19:16:12.583163+00:00 app[web.1]: [verbose]  [fetch] GET https://api.github.com/repos/nextstrain/cli/actions/artifacts/501513693/zip (cache: undefined)
2023-01-09T19:16:12.639463+00:00 app[web.1]: [verbose]  [fetch] 403 Forbidden https://api.github.com/repos/nextstrain/cli/actions/artifacts/501513693/zip (cache skip, timestamp null)
2023-01-09T19:16:12.640138+00:00 app[web.1]: [verbose]  Sending InternalServerError: upstream said: 403 Forbidden error as JSON
2023-01-09T19:16:12.641472+00:00 heroku[router]: at=info method=GET path="/cli/download/ci-build/3859193828/standalone-x86_64-unknown-linux-gnu.tar.gz" host=next.nextstrain.org request_id=… fwd="…" dyno=web.1 connect=0ms service=173ms status=500 bytes=297 protocol=https

This could be a scope issue with the token we're using for next.nextstrain.org?

tsibley commented 1 year ago

Ok, I think I've come to an understanding here.

During development, I tested locally with my standard "various and sundry" personal access token (classic) that's granted limited scope: just public_repo. Downloading artifacts from the public nextstrain/cli repo worked fine.

The token we use for nextstrain.org has no scopes (because even public_repo includes write access). This means it has a read-only view of only public resources. I thought this would be sufficient to download artifacts from a public repo, but it turns out not to be. This isn't documented anywhere as far as I can tell.

Both tokens are "classic" personal access tokens.

I tested using a new "fine-grained" token without any permissions granted, which I believe is supposed to be roughly equivalent to a "classic" token without any scopes granted. But there are clearly some differences, because this fine-grained token works for artifact downloading when the classic token doesn't.

![image](https://user-images.githubusercontent.com/79913/211398657-186a10c4-63d2-4224-a79b-364ecb75e47a.png) ![image](https://user-images.githubusercontent.com/79913/211398601-7a03e208-2636-47fd-b673-f795f29fba8d.png)

So I think we want to replace the classic token with a fine-grained token (which is what GitHub generally recommends now anyhow). This would let us still use a single GITHUB_TOKEN for nextstrain.org, while not granting it permissions/scopes we don't want for security reasons.

tsibley commented 1 year ago

I thought this would be sufficient to download artifacts from a public repo, but it turns out not to be. This isn't documented anywhere as far as I can tell.

Note that the classic token we use can view information about an artifact:

GET https://api.github.com/repos/nextstrain/cli/actions/artifacts/501513693 HTTP/1.1

HTTP/1.1 200 
content-type: application/json; charset=utf-8
content-length: 695
x-oauth-scopes:                          
x-accepted-oauth-scopes: 

{
  "id": 501513693,
  "node_id": "MDg6QXJ0aWZhY3Q1MDE1MTM2OTM=",                                                                                                                                                   
  "name": "standalone-x86_64-unknown-linux-gnu",
  "size_in_bytes": 51091874,
  "url": "https://api.github.com/repos/nextstrain/cli/actions/artifacts/501513693",
  "archive_download_url": "https://api.github.com/repos/nextstrain/cli/actions/artifacts/501513693/zip",
  "expired": false,
  "created_at": "2023-01-06T23:45:43Z",
  "updated_at": "2023-01-06T23:45:45Z",
  "expires_at": "2023-04-06T23:17:36Z",
  "workflow_run": {
    "id": 3859193828,
    "repository_id": 139047738,
    "head_repository_id": 139047738,
    "head_branch": "trs/singularity-runtime",
    "head_sha": "d435db68160b6a45277b1ee72006a5e16090259c"
  }
}

just not download it:

GET https://api.github.com/repos/nextstrain/cli/actions/artifacts/501513693/zip HTTP/1.1

HTTP/1.1 403
content-type: application/json; charset=utf-8
content-length: 168
x-oauth-scopes: 
x-accepted-oauth-scopes: 

{
  "message": "You must have the actions scope to download artifacts.",
  "documentation_url": "https://docs.github.com/rest/reference/actions#download-an-artifact"
}
tsibley commented 1 year ago

Despite the error response saying the actions scope is required, that is not a documented scope for personal access tokens (which are OAuth tokens).

The download endpoint documentation says:

Anyone with read access to the repository can use this endpoint. If the repository is private you must use an access token with the repo scope. GitHub Apps must have the actions:read permission to use this endpoint.

Our classic token has read access to the repository, so should have access per this doc. The repo is not private. The classic token is a personal access token, not a GitHub Apps token, so should not require the actions:read permission.

tsibley commented 1 year ago

I replaced the GITHUB_TOKEN used by next.nextstrain.org with a new fine-grained token as described above:

image

and all seems to be working there. I'll make the same change to nextstrain.org soon, and eventually revoke the classic token.

One thing to note is that fine-grained tokens must have expiration dates ≤1y in the future, so this token expires 9 Jan 2024, and we'll have to manually rotate it before then. Not sure the best way to track this task…

tsibley commented 1 year ago

With this endpoint in place, we can ~easily extend it to also support pr/X "versions", which would lookup the latest successful CI run for the given PR and download that.

Implemented as https://github.com/nextstrain/nextstrain.org/pull/645.