workflowhub-eu / workflowhub-graph

Knowledge Graph generator for WorkflowHub
BSD 2-Clause "Simplified" License
5 stars 1 forks source link

Extract toolshed identifiers from Galaxy workflows #32

Open stain opened 3 months ago

stain commented 3 months ago

There are toolshed identifiers inside Galaxy workflows, but these are not carried forward into the RO-Crate nor to the knowledge graph.

Example, from https://workflowhub.eu/workflows/7 we have Genomics-4-PE_Variation.ga with:

            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpeff/snpEff_build_gb/4.3+T.galaxy4",
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpeff/snpEff_build_gb/4.3+T.galaxy4",
            "tool_shed_repository": {
                "changeset_revision": "74aebe30fb52",
                "name": "snpeff",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },

The identifiers exist in a mangled state in the Abstract CWL:

    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_snpeff_snpEff_build_gb_4_3+T_galaxy4

..but they do not appear in the RO-Crate metadata.

Note that these identifiers are NOT global URIs, but almost! They are references to Mercurial but again they are not Mercurial URIs (hgt+http://).

Why do we want these? Well, on a good day you can then combine them with Toolshed information to find the bio.tool identifiers. But at the moment this tool information seems to be not exposed by Galaxy in a good way and it would be overkill for this work to try climbing into Mercurial...

supernord commented 3 months ago

Hi @stain I've linked here to the BioHackathon 2022 mapping between WorkflowHub, Galaxy and bio.tools : https://github.com/bio-tools/biohackathon2022/blob/main/scripts/workflowhub_galaxy_biotools.py

Maybe this will be useful for the graph?

I think some elements of this are incorporated into the WorkflowHub registration process for Galaxy workflows, but like you pointed out this doesn't necessarily mean the metadata is in the RO-crate