uio-mana / CWR-Hackathon

0 stars 1 forks source link

Make ModGP Outputs into RO-Crates #1

Open ErikKusch opened 5 months ago

ErikKusch commented 5 months ago

Improve fairness by packaging ModGP outputs into RO-Crates

jgrieb commented 5 months ago

Preliminary thoughts

RO-Crates provide a lightweight technology stack to implement the FAIR Digital Object concept based on common web technologies involving provision of structured (meta)data with schema.org extensions such as Bioschemas and typed relationships with FAIR Signposting.

Data deposition

The generated outputs should be stored as RO-Crates (in the best case within a sustainable data repository that ensures the long-term availability of the data) and made available to clients via the web. The data should receive a PID. Some options could be

rohub.com seems to be the best option to quickly draft and publish an RO-Crate for one of the outputs of the generated outputs of ModGP

jgrieb commented 5 months ago

E.g. publishing via ROHub could result in the following ROCrate metadata description

{
  "@context": [
    "https://w3id.org/ro/crate/1.1/context",
    "https://w3id.org/ro/terms/earth-science#",
    {
      "description": "http://purl.org/dc/terms/description",
      "title": "http://purl.org/dc/terms/title",
      "creation_mode": "http://w3id.org/ro-id/rohub/model#creation_mode"
    }
  ],
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.1"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "identifier": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e",
      "hasPart": [
        {
          "@id": "data%2F"
        },
        {
          "@id": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09"
        }
      ],
      "@type": [
        "Dataset"
      ],
      "creator": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "author": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "studySubject": [
        "http://eurovoc.europa.eu/632"
      ],
      "citeAs": "Erik Kusch. \"Lathyrus aphaca distribution.\" ROHub. Jan 24 ,2024. https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e.",
      "datePublished": "2024-01-24 09:35:26.012186+00:00",
      "dateCreated": "2024-01-24 09:35:26.012186+00:00",
      "dateModified": "2024-01-24 10:31:46.204306+00:00",
      "contributors": [],
      "name": "Lathyrus aphaca distribution",
      "contentSize": 0,
      "encodingFormat": "application/ld+json",
      "contentUrl": "https://api.rohub.org/api/ros/db91bd2f-2886-4078-90a3-0e5d21003b7e/crate/download/",
      "mainEntity": "Dataset",
      "keywords": [
        "SDM",
        "ModGP"
      ],
      "description": "ModGP output for Lathyrus aphaca",
      "https://w3id.org/ro/terms/earth-science#template": "https://w3id.org/ro/terms/earth-science#DataCentricResearchObjectTemplate",
      "modifiedTime": "2024-01-24 10:31:46.204306+00:00",
      "http://w3id.org/ro-id/rohub/model#creation_mode": "MANUAL"
    },
    {
      "@id": "biblio%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "biblio"
    },
    {
      "@id": "data%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "data"
    },
    {
      "@id": "metadata%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "metadata"
    },
    {
      "@id": "raw%20data%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "raw data"
    },
    {
      "@id": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09",
      "@type": [
        "File",
        "Dataset"
      ],
      "name": "Lathyrus_aphaca-Outputs.nc",
      "sdDatePublished": "2024-01-24 10:31:43.534408+00:00",
      "dateCreated": "2024-01-24 10:31:43.534408+00:00",
      "dateModified": "2024-01-24 10:31:46.089805+00:00",
      "contentUrl": "https://api.rohub.org/api/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09/download/",
      "@reverse": {
        "hasPart": [
          {
            "@id": "data%2F"
          }
        ]
      },
      "contentSize": 50020602,
      "encodingFormat": "application/x-netcdf"
    },
    {
      "@id": "https://w3id.org/ro-id/users/https%3A//orcid.org/0000-0002-4984-7646",
      "email": "erik.kusch@nhm.uio.no",
      "@type": "agent"
    }
  ]
}
jgrieb commented 5 months ago

An alternative to publishing the RO-Crate directly in a repository like ROHub would be to create RO-Crate as packaged .zip files as an output of the ModGP script. This mode of using the ROCrate is also called "attached mode" (see here) Afterwards we evaluate how well these RO-Crate packages can be uploaded e.g. into the ROHub.

The difference between the attached RO-Crate and the uploaded one is that in the "packaged" mode in the metadata file the IRIs are relative paths within the local directory structure, while after uploading to a repository the IRIs should become web-resolvable URLs.

A minimal representation of the ro-crate-metadata.json of the attached RO-Crate could be:

{
  "@context": [
    "https://w3id.org/ro/crate/1.1/context"
  ],
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.1"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "hasPart": [
        {
          "@id": "Lathyrus_aphaca-Outputs.nc"
        }
      ],
      "@type": [
        "Dataset"
      ],
      "creator": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "author": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "studySubject": [
        "http://eurovoc.europa.eu/632"
      ],
      "datePublished": "2024-01-24 09:35:26.012186+00:00",
      "name": "Lathyrus aphaca distribution",
      "encodingFormat": "application/ld+json",
      "contentUrl": "https://api.rohub.org/api/ros/db91bd2f-2886-4078-90a3-0e5d21003b7e/crate/download/",
      "mainEntity": "Dataset",
      "keywords": [
        "SDM",
        "ModGP"
      ],
      "description": "ModGP output for Lathyrus aphaca",
    },
    {
      "@id": "Lathyrus_aphaca-Outputs.nc",
      "@type": [
        "Dataset",
         "File"
      ],
      "name": "Lathyrus_aphaca-Outputs.nc",
      "contentSize": 50020602,
      "encodingFormat": "application/x-netcdf"
    },
    {
      "@id": "https://orcid.org/0000-0002-4984-7646",
      "name": "Erik Kusch"
    }
  ]
}
jgrieb commented 5 months ago

Update and provisional hackathon result

We have manually modeled an example of how the ModGP output should be stored as an RO-Crate: By simply adding an ro-crate-metadata.json file into the directory of outputs per species. The example metadata.json file can be found here: https://github.com/jgrieb/CWR-Hackathon/blob/ro-crate-manual-example/ModGP/example-output/Lathyrus_aphaca/ro-crate-metadata.json

Note that we have additionally also modeled a simplified RO-Crate which represents the ModGP tool itself and thus can be referenced from within the provenance section of the output RO-Crate (section CreateAction). The tool is modeled as a ComputationalWorkflow in line with bioschema's ComputationalWorkflow profile 1.0. This example can be found here: https://github.com/jgrieb/CWR-Hackathon/blob/ro-crate-manual-example/ModGP/tool-ro-crate-metadata.json

Further below we provide some more documentation on the two example files

Outlook

In order to publish the ModGP model and output data in a FAIR way, two steps are required:

  1. The R script which generates the output files after computation for a certain species must be modified, in order to dynamically generate the ro-crate-metadata.json file, based on the manually created example. Afterwards, the complete RO-Crate (including the metadata and the data files itself) should automatically be uploaded and published in the ROHub repository.

  2. When the work on ModGP itself is finished, the tool should be published in a FAIR way. In this case, this would mean uploading the model code as an RO-Crate in WorkflowHub. For this, the second example of the ro-crate-metadata.json mentioned above must be finalized (some metadatafields still incomplete).

Documentation on the two examples

Output dataset RO-Crate

ModGP ComputationalWorkflow