snakemake / snakemake-workflow-catalog

A statically generated catalog of available Snakemake workflows
https://snakemake.github.io/snakemake-workflow-catalog
MIT License
30 stars 10 forks source link

Workflow disappeared from catalog #3

Closed supermaxiste closed 3 years ago

supermaxiste commented 3 years ago

Hi @johanneskoester,

this catalog is a great idea and since its inception, I've worked a bit to adjust my workflow (ARPEGGIO) to follow all the formatting and linting standards.

Unfortunately the last action fix caused my workflow to disappear from the catalog and I'm not sure why. I tried to look into this, but I wasn't able to find any clue.

Best, Stefan

nikostr commented 3 years ago

My repo https://github.com/nikostr/dna-seq-deepvariant-glnexus-variant-calling is not showing up, even though it seems to fulfill the criteria specified. I tried cloning this repo and running the generate_catalog.py script locally, limiting the search to each of our repos, as well as the repos matching the pattern snakemake-deepvariant. The two latter ones do show up in the actual catalog, neither of ours do, but when running the script, they give the same logfile output. Very strange.

nikostr commented 3 years ago

Okay, I've looked closer at this, and it lead me here:

https://github.com/snakemake/snakemake-workflow-catalog/runs/2506448252?check_suite_focus=true

It seems as if the action handles about 1000 repos before failing quietly. This might be related to the 1000 API requests/hour limit mentioned here: https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration

Not sure how to work around this.

supermaxiste commented 3 years ago

Good catch @nikostr!

I tried to find out ways to confirm what you found and it seems like this issue has been present since earlier. This is the action run from before my workflow got removed and you can see that my repository is the last one processed before the quiet failure:

https://github.com/snakemake/snakemake-workflow-catalog/runs/1876964779?check_suite_focus=true

A possible workaround, in theory, would be to split the >2000 API requests in blocks of 1000. So basically there should be 3 actions (or more), with each one processing different chunks of repos at 1 hour distance.

I don't have a lot of time these weeks, but I'll test some solutions at some point.

nikostr commented 3 years ago

I've started looking into this here:

https://github.com/nikostr/snakemake-workflow-catalog/tree/update-latest-workflows

I think there might be a smart workaround simply by sorting the repo search results based on update time. Haven't quite got it to work yet, but I think that should have multiple up-sides.