Open jsheunis opened 10 months ago
Exactly. An ro-crate should be an export of a datalad data model for a single version of a dataset.
Starting with an effort to model an RO-crate with linkml. It seems the first step would be to decide on a good input representation.
Initially, I thought it would be good to take an RO-crate and frame it with something like
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@type": "http://schema.org/Dataset"
}
to get a hierarchical representation. However, this ruins the deduplicating nature of an RO-crate (array of elementary object definitions, ie. an author person appears only once in a record). Moreover, linkml
IO tooling will strip anything that starts with @
, including @id
-- which is essential in an RO-crate, because it represents the "filename/location" in a dataset.
Maybe it would be better to use something like this
{
"@context": "http://schema.org/",
"@graph": [
{
"id": "ro-crate-metadata.json",
"type": "CreativeWork",
"dct:conformsTo": {
"id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"id": "./"
},
"description": "RO-Crate Metadata File Descriptor (this file)"
},
{
"id": "./",
"type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"id": "data1.txt"
},
{
"id": "data2.txt"
}
],
"name": "Example RO-Crate"
},
{
"id": "data1.txt",
"type": "MediaObject",
"author": {
"id": "#alice"
},
"contentLocation": {
"id": "http://sws.geonames.org/8152662/"
},
"description": "One of hopefully many Data Entities"
},
{
"id": "data2.txt",
"type": "MediaObject"
},
{
"id": "#alice",
"type": "Person",
"description": "One of hopefully many Contextual Entities",
"name": "Alice"
},
{
"id": "http://sws.geonames.org/8152662/",
"type": "Place",
"name": "Catalina Park"
}
]
}
This is a plain RO-crate passed through JSON-LD compaction with the context
{
"@context": "http://schema.org/"
}
We could now process @graph
only. However, with a complex RO-crate this may not work, because mixing context sources yields something like this:
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"@id": "./"
},
"description": "RO-Crate Metadata File Descriptor (this file)"
},
...
with context
{
"@context": "https://w3id.org/ro/crate/1.1/context"
}
Maybe we need a custom pre-processor...
Read the following for context:
The question for me is what role the RO-crate specification should play in defining the dataset schema in LinkML. From the issue that's linked above:
To add to (2): one could build a model that is completely separate from RO-crate and purely follows what we see as the ideal for a datalad dataset metadata structure, and then bring in compatibility with RO-crate as a separate tool, i.e. exporting to the RO-crate specification would be one of many supported "translation" options.
@mih @mslw @christian-monch curious to hear your thoughts on this