snakemake-workflows / cyrcular-calling

A Snakemake workflow for ecDNA detection in Nanopore or Illumina sequencing reads derived from DNA samples enriched for circular DNA.
https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/cyrcular-calling
MIT License
3 stars 2 forks source link

Regulatory annotation download link is hardcoded #13

Open julia-luz opened 5 months ago

julia-luz commented 5 months ago

Hello!

https://github.com/snakemake-workflows/cyrcular-calling/blob/ccd556ac3e007e1f5ffc039a9c2550b067c27dc3/workflow/rules/ref.smk#L74

The number between regulatory_features and .gff.gz in the url varies between releases. Attempting to wget https://ftp.ensembl.org/pub/release-{params.release}/regulation/homo_sapiens/homo_sapiens.GRCh38.Regulatory_Build.regulatory_features.20220201.gff.gz if the release version is set to 110 on the config file will cause this to fail, since the proper url for release 110 would be https://ftp.ensembl.org/pub/release-110/regulation/homo_sapiens/homo_sapiens.GRCh38.Regulatory_Build.regulatory_features.20221007.gff.gz

tedil commented 4 months ago

Do you happen to know of a programmatic way to get that date correct? Everything else can easily be derived/specified, but the date cannot be known in advance, unless you either check the ftp directory / there is an overview file with release <-> freeze date information.

julia-luz commented 4 months ago

I'm pretty new to programming in general, which is why my first instinct is probably to regex hammer it - wget does have an option for it, I could play around with that.