nf-core / setup-nextflow

A GitHub action to install Nextflow
MIT License
12 stars 5 forks source link

Hitting API Rate limit fails ungracefully #19

Open edmundmiller opened 1 year ago

edmundmiller commented 1 year ago
 Error: Could not retrieve Nextflow release matching latest-everything.
API rate limit exceeded for installation ID 1565271.
Error: Could not parse the download URL
Cannot read properties of undefined (reading 'filter')
Error: versionSpec parameter is required
Warning: Nextflow appears to have installed correctly, but an error was thrown while running it.

Example run logs_25031.zip

edmundmiller commented 1 year ago

Basically, I think we can see if it returns that error, and then pause and retry.

MillironX commented 1 year ago

I like that idea. The current limit is 1000 requests per hour (source), but I wonder if we can figure out the time to wait from the response headers as shown here.

edmundmiller commented 1 year ago

Oh no, does that consider hitting the cache API? I feel like Sarek could hit that really quickly... Or worse modules during the hackathon!

edmundmiller commented 1 year ago

@adamrtalbot Is hitting this in sarek with the tests. My question is why are we not hitting it in modules then? I haven't heard anyone complain about it.

@ewels had a good idea to set up a fallback on the website, either an API end-point on the nf-core site that hits the GitHub API, or during build pull the versions.

edmundmiller commented 1 year ago

I think I'm getting to the bottom of it.

Sarek is trying to cache it in the workflow.

But the setup action has a cache inside it.

edmundmiller commented 1 year ago

My question is why are we not hitting it in modules then?

Because it's not using setup-nextflow 🫠

edmundmiller commented 1 year ago

Might get some ideas from https://github.com/DeterminateSystems/magic-nix-cache-action.

Since, we're talking caches, I wonder if we can cache work/ between workflows...

MillironX commented 1 year ago

As of v1.3, the workflow runs it's own cache. Sarek is pinned at v1.2 for some reason. Updating that should relieve about 2/3 of the API hits. The caching present in Sarek's workflow does nothing b/c the Nextflow binary doesn't live at /usr/bin/nextflow when installed using setup-nextflow.

Personally, I would love to see all nf-core testing workflows updated to use one install per repo per version, and then all tests are run off of that. I'm really inspired by Julia's testing workflow template. Unfortunately, Nextflow doesn't have as nice of a testing framework as Julia, so that may be a pipe dream.

edmundmiller commented 1 year ago

Personally, I would love to see all nf-core testing workflows updated to use one install per repo per version, and then all tests are run off of that. I'm really inspired by Julia's testing workflow template. Unfortunately, Nextflow doesn't have as nice of a testing framework as Julia, so that may be a pipe dream.

That's nice. Do you not think that would work with nf-test?

That also makes me think we could move to a repo and pull in the GitHub actions like this repo https://github.com/magit/actions So it would just be:

steps:
    steps:
      - name: Install Nextflow
        uses: nf-core/actions/setup-nextflow@main
      - name: Install nf-core
        uses: nf-core/actions/setup-nf-core@main
      - name: Run nf-test
        uses: nf-core/actions/nf-test@main
edmundmiller commented 1 year ago

Not sure if this is over yet...

MatthiasZepper commented 1 year ago

How about checking the API rate limit first before attempting to query the endpoints?

Queries to the limit endpoint are free (no effects on the remaining quota) and one could probably reuse code from my action step.

MillironX commented 1 year ago

How about checking the API rate limit first before attempting to query the endpoints?

We have talked about this.

I wonder if we can figure out the time to wait from the response headers as shown here. https://github.com/nf-core/setup-nextflow/issues/19#issuecomment-1699649337

I'm working on a big update that should make this easier to integrate into the code...

edmundmiller commented 1 year ago

Conversation with @ewels:

  1. Hard-code version in modules again. We'd be on edge except for the month after stable releases
  2. Hard-code the response in this action
  3. Make a API endpoint on the nf-core website that serves a latest-everything Nextflow script(Basically this action in an API endpoint) that we can cache on the website build.

We're favoring option 3, this action could be extremeley simple then. (I think it should stay because it gives people the choice of the super simple uses: setup-nextflow

MillironX commented 1 year ago
3. Make a API endpoint on the nf-core website that serves a `latest-everything` Nextflow script(Basically this action in an API endpoint) that we can cache on the website build.

I've thought about something similar. Does anyone who works at Seqera (looking at you, @ewels) know how the Nextflow.io website catalogs versions? My digging through the script has led me to believe that Nextflow.io mirrors the Github release metadata in some way. Maybe they already have an API that they would let us ping instead of using the Github API?

edmundmiller commented 10 months ago

More ideas from the PRs and issues that are linked here https://github.com/actions/setup-go/issues/16

I think we need to add the GitHub token to the README.

Like this: https://github.com/DeLaGuardo/setup-clojure/blob/4b9b2f7702139e6be8ee376e8b2bef8f7f6850a4/action.yml#L36-L39

ewels commented 10 months ago

@MillironX - I would prefer not to rely on nextflow.io if possible. The current build / back end is somewhat opaque and I'm hoping that we will do a major revamp any day now (I've been hoping that for quite a while 😅 ).

Working with the nf-core website should be pretty trivial and we have full transparency and (community-) control over that site. Also given that https://github.com/nf-core/setup-nextflow is under the @nf-core GitHub org, it feels more natural to me anyway.

ewels commented 10 months ago

Another thing - Both the Nextflow and nf-core websites are static sites. But the nf-core website gets much more regular builds. So that's another good reason to choose nf-core, as we shouldn't need to wait more than a few hours to get the latest version of Nextflow after a release. In contrast, nextflow.io builds are often weeks apart.

MatthiasZepper commented 10 months ago

Since the GraphQL quota is separate, you could also just use GraphQL to query the version instead of REST. There is already a published step for that.

Alternatively, we could determine the version via an unlimited HTTP request, like the Suprabase devs do.

Lastly, one could include a fallback step into the workflow, that will only run if the API request failed. See the documentation which I wrote here how to achieve that.

ewels commented 10 months ago
  1. Yup, that could also work. Though we would still be running against a quota, so may be back here again in a few years when we have 10x users 😉
  2. As far as I can see, the Suprabase example is getting the download URL with a supplied version number, not fetching the latest version number?
  3. Like it, but downside there is that behaviour may change between CI runs which could be confusing.

@MatthiasZepper any reason not to host a basic file on the nf-core website?

MatthiasZepper commented 10 months ago

Yup, that could also work. Though we would still be running against a quota, so may be back here again in a few years when we have 10x users 😉

I can see that you have ambitious goals at Seqera. 😉

Also reversed roles, because I'm usually the overengineer and perfectionist 😆. In that case (also due to your plans with Scatter) implementing a Token rotation solution for all actions would likely be a good option. While the action itself is proprietary, this article essentially describes it so closely, that it should be possible to reverse engineer it.

Also tapping into the 5000 requests per hour for GraphQL would help a lot already and is quite simple with the linked step.

As far as I can see, the Suprabase example is getting the download URL with a supplied version number, not fetching the latest version number?

No, they fetch a non-existing page. They purposefully get a 404 error, but extract the version number due to the automatic redirect implemented by GitHub. Try https://github.com/nextflow-io/nextflow/releases/latest/download/, and you will get the latest stable release of Nextflow delivered for free together with a 404.

Like it, but downside there is that behaviour may change between CI runs which could be confusing.

No idea what you are referring to.

@MatthiasZepper any reason not to host a basic file on the nf-core website?

Would of course work as well, here is the OctoKit fixtures repo as template. Should then be part of the build json files and md cache action?

edmundmiller commented 10 months ago

Hot take:

  1. We write a cache script like in the nf-core website
  2. Run an action to query the Nextflow versions twice a day(cause we'll be swimming in tokens)
  3. We use the hard-coded cache in the action.

We'd probably need to update a gist or some other file location to bump the nextflow versions, otherwise people would have to use setup-nextflow@main

MillironX commented 10 months ago

I think we need to add the GitHub token to the README.

Unfortunately, that won't change anything. The action is already hard-wired to pull in GITHUB_TOKEN.

https://github.com/nf-core/setup-nextflow/blob/d736e9561dd72d5676c6ee2f00b4b03164b99c20/action.yml#L13-L16

MillironX commented 10 months ago

I think putting a versions cache on the nf-core website is the way to go. My main inspiration for this action was https://github.com/julia-actions/setup-julia, and the Julia team has been using an S3 JSON mirror of all Julia version metadata for 4 years now.

https://github.com/julia-actions/setup-julia/blame/f3d4142aa8f1c505424fd41eb89428b43e27c794/src/installer.ts

Implementing that is beyond me. If Astro is half as powerful as Hugo, then my octokit wrapper should provide a solid starting point for the website updating its cache file, but I don't have the time or mental energy to learn yet another JavaScript web language.


Rant mode on.

None of this should be necessary. I don't want to run 1000 Nextflow installs per hour: I would much rather do one install every time CI is triggered, then take a snapshot of that install and use that snapshot as the starting place for every single test system. Earthly does exactly that using a clearly defined input/output model, plus versions are easily pinned using Docker image tags. I would say switch to Earthly, but its remote (think, GitHub Actions runners) support is severely lacking. The whole spirit of Nextflow is to minimize and optimize the amount of work needed based on a directed acyclic graph of required tasks. GitHub Actions takes that spirit, chucks it off a cliff and embraces the brute-force approach, such that they would rather spin up entirely new virtual machines and download many new multi-megabyte files over and over again, rather than allow us to ping an API for a few kilobytes of data.

Rant mode off.

mashehu commented 10 months ago

Okay, let's get it started on nf-co.re then: https://deploy-preview-2173--nf-core.netlify.app/nf_version_latest What exactly should the page return?

ewels commented 10 months ago

Great 👍🏻 Also, the website is already building a minimum of once per day. So no need to get clever with caches or anything like that, just having a static endpoint that grabs the latest version from the GitHub API at build time should be perfectly sufficient.

@mashehu - we need to return:

Suggestion would be to use keys that match the GitHub action terminology:

eg:

{
  "latest-stable": "v23.10.0",
  "latest-edge": "v23.12.0-edge",
  "latest-everything": "v23.12.0-edge"
}

Could bung in extra data if you like, but those are the three that will be used here I think.

Could also put in a top-level key to future-proof, so it's easier to add more stuff in the future to use the endpoint for other things.

ewels commented 10 months ago

No, they fetch a non-existing page. They purposefully get a 404 error, but extract the version number due to the automatic redirect implemented by GitHub.

Ooh, clever!

MillironX commented 10 months ago

we need to return:

  • most recent edge release
  • most recent stable release
  • either: dates for each and / or the most recent release out of the two above

Why not simply have all releases as a JSON file? That's what the Julia folks do. If we hard-code the "latest" versions into the nf-core website, then this action will just hit API limits when trying to resolve a numbered version. My idea of the workflow was

  1. Have the nf-core website crawl all releases on GitHub once per day. From the boatload of info that Octokit provides, the nf-core website will create a static JSON file that contains (for each Nextflow release)
    1. The version number
    2. Whether that version is an edge release or not
    3. The download URL for the regular script file
    4. The download URL for the "all" variant script file
    5. SHA hashes to verify the integrity of scripts?
  2. Point this action to the JSON file on the nf-core website rather than Octokit

Our source of truth remains the same that way, but we're effectively caching that info on the nf-core website. The logic of what is the "most recent" remains in this action, as well as all the version resolving code for numbered versions.

edmundmiller commented 10 months ago

We could have two seperate end-points

{
  "latest-stable": "v23.10.0",
  "latest-edge": "v23.12.0-edge",
  "latest-everything": "v23.12.0-edge"
}

and

{ 
  "v23.10.0",
  "v23.11.0",
  "v23.12.0-edge",
  ...
}

Or combine them

{ 
  "latest-stable": "v23.10.0",
  "latest-edge": "v23.12.0-edge",
  "latest-everything": "v23.12.0-edge",
  versions: [
  "v23.10.0",
  "v23.11.0",
  "v23.12.0-edge",
  ...
  ]
}
mashehu commented 10 months ago

here you go: https://deploy-preview-2173--nf-core.netlify.app/nf_version (somehow the conten-type is ignored, but it's json)

ewels commented 10 months ago

If we hard-code the "latest" versions into the nf-core website, then this action will just hit API limits when trying to resolve a numbered version.

I don't really understand this. The nf-core website would only be doing 2 queries a day, so the number of API requests should go right down, so we should be fine on the API limit?

MillironX commented 10 months ago

The nf-core website would only be doing 2 queries a day, so the number of API requests should go right down, so we should be fine on the API limit?

The website will only do two API requests per day, but let's say we implement a a design that only exposes the "latest" versions on the website. Every time setup-nextflow is run using a "latest" version, setup-nextflow will make zero API calls to GitHub. Excellent. That's what we want. But now if I run setup-nextflow with a numbered version, say "23.10", the website file wouldn't contain any information on what v23.10 looks like, so now setup-nextflow would need to crawl the GitHub API again, and we'll hit our rate limits and be back where we started.

In fact, as setup-nextflow is currently written, "latest-stable" results in the fewest number of API calls, because it can leverage GitHub's "latest release" API to immediately get the download information rather than making two calls to see if a release is latest, and then getting the download information. In other words, the rate limiting is most important for numbered versions, and not "latest" versions.

MillironX commented 10 months ago

I think @edmundmiller is on the right track about combining "latest" and numbered releases into a single JSON output. My ideal schema would look something like this:

{
  "latest": {
    "stable": {
      "version": "v23.10.0",
      "isEdge": false,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow-23.10.0-all"
    },
    "edge": {
      "version": "v23.12.0-edge",
      "isEdge": true,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
    },
    "everything": {
      "version": "v23.12.0-edge",
      "isEdge": true,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
    }
  },
  "versions": [
    {
      "version": "v23.12.0-edge",
      "isEdge": true,
      "downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
      "downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
    },
   { "etc, etc, etc": "" }
  ]
}

Yes, I know it's slightly redundant, but I think this maximizes compatibility with current systems while opening up new optimization opportunities.

ewels commented 10 months ago

But if a specific version is known then the URL can just be guessed, no? No API requests needed:

https://github.com/nextflow-io/nextflow/releases/download/{{ version }}/nextflow

That being said, I also have nothing against listing all the other versions - we're fetching that anyway and it's no extra effort. Very similar to what we're doing with https://nf-co.re/pipelines.json really.

MillironX commented 10 months ago

But if a specific version is known then the URL can just be guessed, no? No API requests needed:

If, and only if a user specifies the explicit version number, e.g. 23.10.0 instead of 23.10. Maybe I'm thinking of this from too much of a semver perspective, but I like to specify year and month versions, then let the version resolve to the most bugfixed version that corresponds to that combo, e.g. let 23.04 resolve to getting 23.04.4. Plus, it's nice to try and resolve versions against an explicit list, removing a possible meaning of 404 errors.

mashehu commented 10 months ago

Okay, added the most recent suggestion: https://deploy-preview-2173--nf-core.netlify.app/nf_version

ewels commented 10 months ago

Looks good! I was thrown for a moment as I thought that the edge release should be the most recent, but I see that there was a patch on 23.10.1 a few days ago.. 😅

mashehu commented 10 months ago

The above pr is merged, so you can use https://nf-co.re/nf_version

ewels commented 10 months ago

Only just noticed the URL. What do you think about making it https://nf-co.re/nextflow_version so that it's absolutely clear?

There are too many nfs around for my liking generally 😆

MillironX commented 10 months ago

@mashehu, I notice the versions only go back to 22.11-edge. I'm guessing that these are only the first page of results? https://docs.github.com/en/rest/using-the-rest-api/using-pagination-in-the-rest-api?apiVersion=2022-11-28

mashehu commented 10 months ago

Okay, https://nf-co.re/nextflow_version returns now ALL versions

MillironX commented 10 months ago

Beautiful. Now for one final nitpick. @ewels noticed that the "latest-everything" tag returns a stable patch release currently. I'm not convinced that's the desired behavior. Here's my reasoning:

The "edge" releases contain (nearly) all commits approved on the master branch, including new features and hotfix patches. A stable release may contain new features if it is a major or minor (year or month) version bump, but will only contain hotfixes if it is a patch version bump. The goal of testing against "latest-everything" is to test against the most new features and hotfixes. So, the latest "edge" release would contain all the hotfixes of a patch release, plus new features, while the patch release will remove features that had been tested against on the previous "edge" release, only adding hotfixes. In other words, basing "latest-everything" on dates might exclude the latest features that might be breaking, going against the point of using the "latest-everything."

This is why I wrote setup-nextflow to use version comparison to find the "latest" versions.

If there is something wrong with my reasoning, or if you all think it is not worth the distinction, then we will just keep the new API as is.

MillironX commented 10 months ago

Vote

👍 for making the "latest-everything" version resolve using version numbers (e.g. v23.12.0-edge > v23.10.1)

👎 for making the "latest-everything" version resolve using dates (e.g. v23.10.1 > v23.12.0-edge)

ewels commented 10 months ago

Good point 👍🏻 As long as it returns the new stable release when it's ahead it the most recent edge release, I'm happy (which it should do, based on version number matching).

mashehu commented 10 months ago

okay, new sorting is live: https://github.com/nf-core/website/pull/2216 thanks for pointing this out!

MillironX commented 10 months ago

Thanks so much, @mashehu. Now to see about swapping out the API calls here 🤔

adamrtalbot commented 10 months ago

So do we (pipeline developers) need to do anything? do we 'just' need to update this action when the new feature is ready?

mashehu commented 10 months ago

if you have it set as uses: nf-core/setup-nextflow@v1you are good

mashehu commented 10 months ago

still happening here: https://github.com/nf-core/tools/actions/runs/7656119613/job/20863705436#step:8:18 or am I missing something?

MillironX commented 10 months ago

I'm still working on the API swap out. ETA on that is a few months based on how much stuff needs to change. Interesting that #31 didn't take care of that, tho. Would you mind rerunning with runner logs and step logs turned on and sending me the logs?

mashehu commented 10 months ago

Sure, next time I run into it.