Open edmundmiller opened 1 year ago
Basically, I think we can see if it returns that error, and then pause and retry.
Oh no, does that consider hitting the cache API? I feel like Sarek could hit that really quickly... Or worse modules during the hackathon!
@adamrtalbot Is hitting this in sarek with the tests. My question is why are we not hitting it in modules then? I haven't heard anyone complain about it.
@ewels had a good idea to set up a fallback on the website, either an API end-point on the nf-core site that hits the GitHub API, or during build pull the versions.
I think I'm getting to the bottom of it.
Sarek is trying to cache it in the workflow.
But the setup action has a cache inside it.
My question is why are we not hitting it in modules then?
Because it's not using setup-nextflow 🫠
Might get some ideas from https://github.com/DeterminateSystems/magic-nix-cache-action.
Since, we're talking caches, I wonder if we can cache work/
between workflows...
As of v1.3, the workflow runs it's own cache. Sarek is pinned at v1.2 for some reason. Updating that should relieve about 2/3 of the API hits. The caching present in Sarek's workflow does nothing b/c the Nextflow binary doesn't live at /usr/bin/nextflow
when installed using setup-nextflow
.
Personally, I would love to see all nf-core testing workflows updated to use one install per repo per version, and then all tests are run off of that. I'm really inspired by Julia's testing workflow template. Unfortunately, Nextflow doesn't have as nice of a testing framework as Julia, so that may be a pipe dream.
Personally, I would love to see all nf-core testing workflows updated to use one install per repo per version, and then all tests are run off of that. I'm really inspired by Julia's testing workflow template. Unfortunately, Nextflow doesn't have as nice of a testing framework as Julia, so that may be a pipe dream.
That's nice. Do you not think that would work with nf-test?
That also makes me think we could move to a repo and pull in the GitHub actions like this repo https://github.com/magit/actions So it would just be:
steps:
steps:
- name: Install Nextflow
uses: nf-core/actions/setup-nextflow@main
- name: Install nf-core
uses: nf-core/actions/setup-nf-core@main
- name: Run nf-test
uses: nf-core/actions/nf-test@main
Not sure if this is over yet...
How about checking the API rate limit first before attempting to query the endpoints?
Queries to the limit endpoint are free (no effects on the remaining quota) and one could probably reuse code from my action step.
How about checking the API rate limit first before attempting to query the endpoints?
We have talked about this.
I wonder if we can figure out the time to wait from the response headers as shown here. https://github.com/nf-core/setup-nextflow/issues/19#issuecomment-1699649337
I'm working on a big update that should make this easier to integrate into the code...
Conversation with @ewels:
latest-everything
Nextflow script(Basically this action in an API endpoint) that we can cache on the website build.We're favoring option 3, this action could be extremeley simple then. (I think it should stay because it gives people the choice of the super simple uses: setup-nextflow
3. Make a API endpoint on the nf-core website that serves a `latest-everything` Nextflow script(Basically this action in an API endpoint) that we can cache on the website build.
I've thought about something similar. Does anyone who works at Seqera (looking at you, @ewels) know how the Nextflow.io website catalogs versions? My digging through the script has led me to believe that Nextflow.io mirrors the Github release metadata in some way. Maybe they already have an API that they would let us ping instead of using the Github API?
More ideas from the PRs and issues that are linked here https://github.com/actions/setup-go/issues/16
I think we need to add the GitHub token to the README.
@MillironX - I would prefer not to rely on nextflow.io if possible. The current build / back end is somewhat opaque and I'm hoping that we will do a major revamp any day now (I've been hoping that for quite a while 😅 ).
Working with the nf-core website should be pretty trivial and we have full transparency and (community-) control over that site. Also given that https://github.com/nf-core/setup-nextflow is under the @nf-core GitHub org, it feels more natural to me anyway.
Another thing - Both the Nextflow and nf-core websites are static sites. But the nf-core website gets much more regular builds. So that's another good reason to choose nf-core, as we shouldn't need to wait more than a few hours to get the latest version of Nextflow after a release. In contrast, nextflow.io builds are often weeks apart.
Since the GraphQL quota is separate, you could also just use GraphQL to query the version instead of REST. There is already a published step for that.
Alternatively, we could determine the version via an unlimited HTTP request, like the Suprabase devs do.
Lastly, one could include a fallback step into the workflow, that will only run if the API request failed. See the documentation which I wrote here how to achieve that.
@MatthiasZepper any reason not to host a basic file on the nf-core website?
Yup, that could also work. Though we would still be running against a quota, so may be back here again in a few years when we have 10x users 😉
I can see that you have ambitious goals at Seqera. 😉
Also reversed roles, because I'm usually the overengineer and perfectionist 😆. In that case (also due to your plans with Scatter) implementing a Token rotation solution for all actions would likely be a good option. While the action itself is proprietary, this article essentially describes it so closely, that it should be possible to reverse engineer it.
Also tapping into the 5000 requests per hour for GraphQL would help a lot already and is quite simple with the linked step.
As far as I can see, the Suprabase example is getting the download URL with a supplied version number, not fetching the latest version number?
No, they fetch a non-existing page. They purposefully get a 404 error, but extract the version number due to the automatic redirect implemented by GitHub. Try https://github.com/nextflow-io/nextflow/releases/latest/download/, and you will get the latest stable release of Nextflow delivered for free together with a 404.
Like it, but downside there is that behaviour may change between CI runs which could be confusing.
No idea what you are referring to.
@MatthiasZepper any reason not to host a basic file on the nf-core website?
Would of course work as well, here is the OctoKit fixtures repo as template. Should then be part of the build json files and md cache action?
Hot take:
We'd probably need to update a gist or some other file location to bump the nextflow versions, otherwise people would have to use setup-nextflow@main
I think we need to add the GitHub token to the README.
Unfortunately, that won't change anything. The action is already hard-wired to pull in GITHUB_TOKEN.
I think putting a versions cache on the nf-core website is the way to go. My main inspiration for this action was https://github.com/julia-actions/setup-julia, and the Julia team has been using an S3 JSON mirror of all Julia version metadata for 4 years now.
Implementing that is beyond me. If Astro is half as powerful as Hugo, then my octokit wrapper should provide a solid starting point for the website updating its cache file, but I don't have the time or mental energy to learn yet another JavaScript web language.
Rant mode on.
None of this should be necessary. I don't want to run 1000 Nextflow installs per hour: I would much rather do one install every time CI is triggered, then take a snapshot of that install and use that snapshot as the starting place for every single test system. Earthly does exactly that using a clearly defined input/output model, plus versions are easily pinned using Docker image tags. I would say switch to Earthly, but its remote (think, GitHub Actions runners) support is severely lacking. The whole spirit of Nextflow is to minimize and optimize the amount of work needed based on a directed acyclic graph of required tasks. GitHub Actions takes that spirit, chucks it off a cliff and embraces the brute-force approach, such that they would rather spin up entirely new virtual machines and download many new multi-megabyte files over and over again, rather than allow us to ping an API for a few kilobytes of data.
Rant mode off.
Okay, let's get it started on nf-co.re then: https://deploy-preview-2173--nf-core.netlify.app/nf_version_latest What exactly should the page return?
Great 👍🏻 Also, the website is already building a minimum of once per day. So no need to get clever with caches or anything like that, just having a static endpoint that grabs the latest version from the GitHub API at build time should be perfectly sufficient.
@mashehu - we need to return:
Suggestion would be to use keys that match the GitHub action terminology:
latest-stable
latest-edge
latest-everything
eg:
{
"latest-stable": "v23.10.0",
"latest-edge": "v23.12.0-edge",
"latest-everything": "v23.12.0-edge"
}
Could bung in extra data if you like, but those are the three that will be used here I think.
Could also put in a top-level key to future-proof, so it's easier to add more stuff in the future to use the endpoint for other things.
No, they fetch a non-existing page. They purposefully get a 404 error, but extract the version number due to the automatic redirect implemented by GitHub.
Ooh, clever!
we need to return:
- most recent edge release
- most recent stable release
- either: dates for each and / or the most recent release out of the two above
Why not simply have all releases as a JSON file? That's what the Julia folks do. If we hard-code the "latest" versions into the nf-core website, then this action will just hit API limits when trying to resolve a numbered version. My idea of the workflow was
Our source of truth remains the same that way, but we're effectively caching that info on the nf-core website. The logic of what is the "most recent" remains in this action, as well as all the version resolving code for numbered versions.
We could have two seperate end-points
{
"latest-stable": "v23.10.0",
"latest-edge": "v23.12.0-edge",
"latest-everything": "v23.12.0-edge"
}
and
{
"v23.10.0",
"v23.11.0",
"v23.12.0-edge",
...
}
Or combine them
{
"latest-stable": "v23.10.0",
"latest-edge": "v23.12.0-edge",
"latest-everything": "v23.12.0-edge",
versions: [
"v23.10.0",
"v23.11.0",
"v23.12.0-edge",
...
]
}
here you go: https://deploy-preview-2173--nf-core.netlify.app/nf_version (somehow the conten-type is ignored, but it's json)
If we hard-code the "latest" versions into the nf-core website, then this action will just hit API limits when trying to resolve a numbered version.
I don't really understand this. The nf-core website would only be doing 2 queries a day, so the number of API requests should go right down, so we should be fine on the API limit?
The nf-core website would only be doing 2 queries a day, so the number of API requests should go right down, so we should be fine on the API limit?
The website will only do two API requests per day, but let's say we implement a a design that only exposes the "latest" versions on the website. Every time setup-nextflow
is run using a "latest" version, setup-nextflow
will make zero API calls to GitHub. Excellent. That's what we want. But now if I run setup-nextflow
with a numbered version, say "23.10", the website file wouldn't contain any information on what v23.10 looks like, so now setup-nextflow
would need to crawl the GitHub API again, and we'll hit our rate limits and be back where we started.
In fact, as setup-nextflow
is currently written, "latest-stable" results in the fewest number of API calls, because it can leverage GitHub's "latest release" API to immediately get the download information rather than making two calls to see if a release is latest, and then getting the download information. In other words, the rate limiting is most important for numbered versions, and not "latest" versions.
I think @edmundmiller is on the right track about combining "latest" and numbered releases into a single JSON output. My ideal schema would look something like this:
{
"latest": {
"stable": {
"version": "v23.10.0",
"isEdge": false,
"downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow",
"downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow-23.10.0-all"
},
"edge": {
"version": "v23.12.0-edge",
"isEdge": true,
"downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
"downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
},
"everything": {
"version": "v23.12.0-edge",
"isEdge": true,
"downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
"downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
}
},
"versions": [
{
"version": "v23.12.0-edge",
"isEdge": true,
"downloadUrl": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow",
"downloadUrlAll": "https://github.com/nextflow-io/nextflow/releases/download/v23.12.0-edge/nextflow-23.12.0-edge-all"
},
{ "etc, etc, etc": "" }
]
}
Yes, I know it's slightly redundant, but I think this maximizes compatibility with current systems while opening up new optimization opportunities.
But if a specific version is known then the URL can just be guessed, no? No API requests needed:
https://github.com/nextflow-io/nextflow/releases/download/{{ version }}/nextflow
That being said, I also have nothing against listing all the other versions - we're fetching that anyway and it's no extra effort. Very similar to what we're doing with https://nf-co.re/pipelines.json really.
But if a specific version is known then the URL can just be guessed, no? No API requests needed:
If, and only if a user specifies the explicit version number, e.g. 23.10.0 instead of 23.10. Maybe I'm thinking of this from too much of a semver perspective, but I like to specify year and month versions, then let the version resolve to the most bugfixed version that corresponds to that combo, e.g. let 23.04 resolve to getting 23.04.4. Plus, it's nice to try and resolve versions against an explicit list, removing a possible meaning of 404 errors.
Okay, added the most recent suggestion: https://deploy-preview-2173--nf-core.netlify.app/nf_version
Looks good! I was thrown for a moment as I thought that the edge release should be the most recent, but I see that there was a patch on 23.10.1 a few days ago.. 😅
The above pr is merged, so you can use https://nf-co.re/nf_version
Only just noticed the URL. What do you think about making it https://nf-co.re/nextflow_version so that it's absolutely clear?
There are too many nf
s around for my liking generally 😆
@mashehu, I notice the versions only go back to 22.11-edge. I'm guessing that these are only the first page of results? https://docs.github.com/en/rest/using-the-rest-api/using-pagination-in-the-rest-api?apiVersion=2022-11-28
Okay, https://nf-co.re/nextflow_version returns now ALL versions
Beautiful. Now for one final nitpick. @ewels noticed that the "latest-everything" tag returns a stable patch release currently. I'm not convinced that's the desired behavior. Here's my reasoning:
The "edge" releases contain (nearly) all commits approved on the master branch, including new features and hotfix patches. A stable release may contain new features if it is a major or minor (year or month) version bump, but will only contain hotfixes if it is a patch version bump. The goal of testing against "latest-everything" is to test against the most new features and hotfixes. So, the latest "edge" release would contain all the hotfixes of a patch release, plus new features, while the patch release will remove features that had been tested against on the previous "edge" release, only adding hotfixes. In other words, basing "latest-everything" on dates might exclude the latest features that might be breaking, going against the point of using the "latest-everything."
This is why I wrote setup-nextflow
to use version comparison to find the "latest" versions.
If there is something wrong with my reasoning, or if you all think it is not worth the distinction, then we will just keep the new API as is.
👍 for making the "latest-everything" version resolve using version numbers (e.g. v23.12.0-edge > v23.10.1)
👎 for making the "latest-everything" version resolve using dates (e.g. v23.10.1 > v23.12.0-edge)
Good point 👍🏻 As long as it returns the new stable release when it's ahead it the most recent edge release, I'm happy (which it should do, based on version number matching).
okay, new sorting is live: https://github.com/nf-core/website/pull/2216 thanks for pointing this out!
Thanks so much, @mashehu. Now to see about swapping out the API calls here 🤔
So do we (pipeline developers) need to do anything? do we 'just' need to update this action when the new feature is ready?
if you have it set as uses: nf-core/setup-nextflow@v1
you are good
still happening here: https://github.com/nf-core/tools/actions/runs/7656119613/job/20863705436#step:8:18 or am I missing something?
I'm still working on the API swap out. ETA on that is a few months based on how much stuff needs to change. Interesting that #31 didn't take care of that, tho. Would you mind rerunning with runner logs and step logs turned on and sending me the logs?
Sure, next time I run into it.
Example run logs_25031.zip