workflowhub-eu / workflowhub-graph

Knowledge Graph generator for WorkflowHub
BSD 2-Clause "Simplified" License
6 stars 1 forks source link

Ensure https links back to WorkflowHub per RO-Crate #27

Open stain opened 5 months ago

stain commented 5 months ago

In https://github.com/workflowhub-eu/workflowhub-graph/blob/f8a87dbe85a25715d55fb0b757769b109c3f52ad/merged.ttl we find for instance

<arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/> a schema1:Dataset ;
    dct:conformsTo <https://w3id.org/ro/crate/1.1>,
        <https://w3id.org/workflowhub/workflow-ro-crate/1.0> ;
    schema1:author <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/#creator-1>,
        <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/#creator-10>,
        <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/#creator-11>,
        <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/#creator-12>,
        <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/#creator-9> ;
    schema1:description "Analysis of variation within individual COVID-19 samples using Illumina Paired End data. More info can be found at https://covid19.galaxyproject.org/genomics/" ;
    schema1:hasPart <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/Genomics-4-PE_Variation.cwl>,
        <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/Genomics-4-PE_Variation.ga>,
        <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/Genomics-4-PE_Variation.svg> ;
    schema1:identifier "https://workflowhub.eu/workflows/7?version=1" ;
    schema1:license "MIT" ;
    schema1:mainEntity <arcp://uuid,52c833a3-1414-5a42-8e84-27f6f46bcf74/Genomics-4-PE_Variation.ga> ;
    schema1:name "Research Object Crate for Genomics - PE Variation" ;
    schema1:sdDatePublished "2024-06-17 10:59:52 +0100" ;
    schema1:url "https://workflowhub.eu/workflows/7/ro_crate?version=1" .

we have two links back from the UUID-based RO-Crate to WorkflowHub, schema1:url "https://workflowhub.eu/workflows/7/ro_crate?version=1" and schema1:identifier "https://workflowhub.eu/workflows/7?version=1"

These are crucial so connect back into the real world URIs as the arcp URIs are UUID-based and don't resolve, while the workflowhub.eu links do. However we need to verify that these links exist per RO-Crate as crawled, and also that they correspond to the RO-Crate we retrieved. It may depend on if you load from RO-Crate endpoint or through ZIP file as pre-existing crates uploaded to workflowhub would not have known which identifier they would get.

In this case perhaps the workflow needs to inject these glue statements.

It could also be beneficial to have the reverse links from the workflowhub.eu record into the RO-Crate arcp, similar as in https://github.com/ResearchObject/ro-crate/pull/296 and also to include the DOI where existing, now doi.org is missing from the Graph outputs.