Open ErikKusch opened 5 months ago
RO-Crates provide a lightweight technology stack to implement the FAIR Digital Object concept based on common web technologies involving provision of structured (meta)data with schema.org extensions such as Bioschemas and typed relationships with FAIR Signposting.
The generated outputs should be stored as RO-Crates (in the best case within a sustainable data repository that ensures the long-term availability of the data) and made available to clients via the web. The data should receive a PID. Some options could be
rohub.com seems to be the best option to quickly draft and publish an RO-Crate for one of the outputs of the generated outputs of ModGP
E.g. publishing via ROHub could result in the following ROCrate metadata description
{
"@context": [
"https://w3id.org/ro/crate/1.1/context",
"https://w3id.org/ro/terms/earth-science#",
{
"description": "http://purl.org/dc/terms/description",
"title": "http://purl.org/dc/terms/title",
"creation_mode": "http://w3id.org/ro-id/rohub/model#creation_mode"
}
],
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"identifier": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e",
"hasPart": [
{
"@id": "data%2F"
},
{
"@id": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09"
}
],
"@type": [
"Dataset"
],
"creator": [
"https://orcid.org/0000-0002-4984-7646"
],
"author": [
"https://orcid.org/0000-0002-4984-7646"
],
"studySubject": [
"http://eurovoc.europa.eu/632"
],
"citeAs": "Erik Kusch. \"Lathyrus aphaca distribution.\" ROHub. Jan 24 ,2024. https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e.",
"datePublished": "2024-01-24 09:35:26.012186+00:00",
"dateCreated": "2024-01-24 09:35:26.012186+00:00",
"dateModified": "2024-01-24 10:31:46.204306+00:00",
"contributors": [],
"name": "Lathyrus aphaca distribution",
"contentSize": 0,
"encodingFormat": "application/ld+json",
"contentUrl": "https://api.rohub.org/api/ros/db91bd2f-2886-4078-90a3-0e5d21003b7e/crate/download/",
"mainEntity": "Dataset",
"keywords": [
"SDM",
"ModGP"
],
"description": "ModGP output for Lathyrus aphaca",
"https://w3id.org/ro/terms/earth-science#template": "https://w3id.org/ro/terms/earth-science#DataCentricResearchObjectTemplate",
"modifiedTime": "2024-01-24 10:31:46.204306+00:00",
"http://w3id.org/ro-id/rohub/model#creation_mode": "MANUAL"
},
{
"@id": "biblio%2F",
"@type": [
"Dataset",
"http://purl.org/wf4ever/wf4ever#Folder"
],
"name": "biblio"
},
{
"@id": "data%2F",
"@type": [
"Dataset",
"http://purl.org/wf4ever/wf4ever#Folder"
],
"name": "data"
},
{
"@id": "metadata%2F",
"@type": [
"Dataset",
"http://purl.org/wf4ever/wf4ever#Folder"
],
"name": "metadata"
},
{
"@id": "raw%20data%2F",
"@type": [
"Dataset",
"http://purl.org/wf4ever/wf4ever#Folder"
],
"name": "raw data"
},
{
"@id": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09",
"@type": [
"File",
"Dataset"
],
"name": "Lathyrus_aphaca-Outputs.nc",
"sdDatePublished": "2024-01-24 10:31:43.534408+00:00",
"dateCreated": "2024-01-24 10:31:43.534408+00:00",
"dateModified": "2024-01-24 10:31:46.089805+00:00",
"contentUrl": "https://api.rohub.org/api/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09/download/",
"@reverse": {
"hasPart": [
{
"@id": "data%2F"
}
]
},
"contentSize": 50020602,
"encodingFormat": "application/x-netcdf"
},
{
"@id": "https://w3id.org/ro-id/users/https%3A//orcid.org/0000-0002-4984-7646",
"email": "erik.kusch@nhm.uio.no",
"@type": "agent"
}
]
}
An alternative to publishing the RO-Crate directly in a repository like ROHub would be to create RO-Crate as packaged .zip
files as an output of the ModGP script. This mode of using the ROCrate is also called "attached mode" (see here) Afterwards we evaluate how well these RO-Crate packages can be uploaded e.g. into the ROHub.
The difference between the attached RO-Crate and the uploaded one is that in the "packaged" mode in the metadata file the IRIs are relative paths within the local directory structure, while after uploading to a repository the IRIs should become web-resolvable URLs.
A minimal representation of the ro-crate-metadata.json
of the attached RO-Crate could be:
{
"@context": [
"https://w3id.org/ro/crate/1.1/context"
],
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"hasPart": [
{
"@id": "Lathyrus_aphaca-Outputs.nc"
}
],
"@type": [
"Dataset"
],
"creator": [
"https://orcid.org/0000-0002-4984-7646"
],
"author": [
"https://orcid.org/0000-0002-4984-7646"
],
"studySubject": [
"http://eurovoc.europa.eu/632"
],
"datePublished": "2024-01-24 09:35:26.012186+00:00",
"name": "Lathyrus aphaca distribution",
"encodingFormat": "application/ld+json",
"contentUrl": "https://api.rohub.org/api/ros/db91bd2f-2886-4078-90a3-0e5d21003b7e/crate/download/",
"mainEntity": "Dataset",
"keywords": [
"SDM",
"ModGP"
],
"description": "ModGP output for Lathyrus aphaca",
},
{
"@id": "Lathyrus_aphaca-Outputs.nc",
"@type": [
"Dataset",
"File"
],
"name": "Lathyrus_aphaca-Outputs.nc",
"contentSize": 50020602,
"encodingFormat": "application/x-netcdf"
},
{
"@id": "https://orcid.org/0000-0002-4984-7646",
"name": "Erik Kusch"
}
]
}
We have manually modeled an example of how the ModGP output should be stored as an RO-Crate: By simply adding an ro-crate-metadata.json
file into the directory of outputs per species. The example metadata.json file can be found here: https://github.com/jgrieb/CWR-Hackathon/blob/ro-crate-manual-example/ModGP/example-output/Lathyrus_aphaca/ro-crate-metadata.json
Note that we have additionally also modeled a simplified RO-Crate which represents the ModGP tool itself and thus can be referenced from within the provenance section of the output RO-Crate (section CreateAction
). The tool is modeled as a ComputationalWorkflow
in line with bioschema's ComputationalWorkflow profile 1.0. This example can be found here: https://github.com/jgrieb/CWR-Hackathon/blob/ro-crate-manual-example/ModGP/tool-ro-crate-metadata.json
Further below we provide some more documentation on the two example files
In order to publish the ModGP model and output data in a FAIR way, two steps are required:
The R script which generates the output files after computation for a certain species must be modified, in order to dynamically generate the ro-crate-metadata.json
file, based on the manually created example. Afterwards, the complete RO-Crate (including the metadata and the data files itself) should automatically be uploaded and published in the ROHub repository.
When the work on ModGP itself is finished, the tool should be published in a FAIR way. In this case, this would mean uploading the model code as an RO-Crate in WorkflowHub. For this, the second example of the ro-crate-metadata.json
mentioned above must be finalized (some metadatafields still incomplete).
hasParts
section in this example only covers one file (Lathyrus_aphaca-Outputs.nc
), however all generated output files must be added here in productionabout
statement which points to a bioschemas:Taxon
entity. This will supposedly be the way how bioschemas recommends to link a dataset to a taxonsdPublisher
, version
Improve fairness by packaging ModGP outputs into RO-Crates