nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
234 stars 187 forks source link

Use Biocontainers API for creating modules #875

Closed ewels closed 3 years ago

ewels commented 3 years ago

Putting down an idea into an issue for nf-core create so I don't forget (but probably too much work to get into PR #869).

Biocontainers itself has quite a nice API that we can use. It's documented here: https://api.biocontainers.pro/ga4gh/trs/v2/ui/#/GA4GH/tools_id_get

For example, we can query MultiQC:

curl -X GET "https://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc" -H  "accept: application/json"
JSON Response ```json { "contains": [], "description": "Multiqc aggregates results from multiple bioinformatics analyses across many samples into a single report. it searches a given directory for analysis logs and compiles a html report. i is a general use tool, perfect for summarising the output from numerous bioinformatics tools.", "id": "multiqc", "identifiers": [ "biotools:multiqc", "PMID:27312411" ], "license": "GPL-3.0", "name": "multiqc", "organization": "biocontainers", "pulls": 3602004, "tool_tags": [ "High-Throughput Nucleotide Sequencing", "Quality Control", "Computational Biology", "Sequencing", "Bioinformatics", "RNA-Seq", "Transcriptomics" ], "tool_url": "https://github.com/ewels/MultiQC", "toolclass": { "description": "CommandLineTool", "id": "0", "name": "CommandLineTool" }, "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc", "versions": [ { "id": "multiqc-1.0", "meta_version": "1.0", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.0" }, { "id": "multiqc-1.5", "meta_version": "1.5", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.5" }, { "id": "multiqc-1.4", "meta_version": "1.4", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.4" }, { "id": "multiqc-0.9.1a0", "meta_version": "0.9.1a0", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-0.9.1a0" }, { "id": "multiqc-1.3", "meta_version": "1.3", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.3" }, { "id": "multiqc-1.6a0", "meta_version": "1.6a0", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.6a0" }, { "id": "multiqc-1.5a", "meta_version": "1.5a", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.5a" }, { "id": "multiqc-1.2", "meta_version": "1.2", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.2" }, { "id": "multiqc-1.1", "meta_version": "1.1", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.1" }, { "id": "multiqc-1.7", "meta_version": "1.7", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.7" }, { "id": "multiqc-1.6", "meta_version": "1.6", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.6" }, { "id": "multiqc-1.8", "meta_version": "1.8", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.8" }, { "id": "multiqc-1.9", "meta_version": "1.9", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.9" } ] } ```

Using this API call gives us several things in a single shot:

It also gives URLs for each version which we can query (_NOTE: It lists http but this doesn't work, needs to be https).

For example, MultiQC 1.9:

curl -X GET "https://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.9" -H  "accept: application/json"
JSON Response ```json { "id": "multiqc-1.9", "images": [ { "downloads": 48596, "image_name": "multiqc==1.9--pyh9f0ad1d_0", "image_type": "Conda", "registry_host": "http://anaconda.org/", "size": 862231, "updated": "2020-05-30T00:00:00Z" }, { "downloads": 0, "image_name": "quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0", "image_type": "Docker", "registry_host": "quay.io/", "size": 194294593, "updated": "2020-05-30T00:00:00Z" }, { "image_name": "https://depot.galaxyproject.org/singularity/multiqc:1.9--pyh9f0ad1d_0", "image_type": "Singularity", "registry_host": "depot.galaxyproject.org/singularity/", "size": 189788160, "updated": "2020-05-31T04:44:00Z" }, { "downloads": 48596, "image_name": "multiqc==1.9--py_1", "image_type": "Conda", "registry_host": "http://anaconda.org/", "size": 862231, "updated": "2020-05-30T00:00:00Z" }, { "downloads": 0, "image_name": "quay.io/biocontainers/multiqc:1.9--py_1", "image_type": "Docker", "registry_host": "quay.io/", "size": 179981913, "updated": "2020-07-28T00:00:00Z" }, { "image_name": "https://depot.galaxyproject.org/singularity/multiqc:1.9--py_1", "image_type": "Singularity", "registry_host": "depot.galaxyproject.org/singularity/", "size": 176119808, "updated": "2020-07-29T06:19:00Z" } ], "meta_version": "1.9", "name": "multiqc", "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.9" } ```

This gives us:

My thought is that we could query this when running nf-core modules create instead of bioconda / quay.io. I think that this would be more accurate as well as giving us a bunch of additional information to put into meta.yml about the tool.

Ideally, we could use either use an exact build tag provided on the command line (fail if not found) or use a questionary select list as done in nf-core launch. This would be very precise (select only from first versions, then builds that are available) and also super user-friendly.

Phil

ewels commented 3 years ago

Major issue here is that the BioContainers website / API seems to lag well behind reality and not contain many packages. Need to investigate why this is before we can use it.

KevinMenden commented 3 years ago

Closing this now - should be re-opened though if we want to have a more fancy way of selecting containers/versions in nf-core modules create. Or maybe a new issue then.