Package slip for archival extension

o2r-project / erc-spec

Executable Research Compendium specification and guides

https://o2r.info/erc-spec/

Creative Commons Zero v1.0 Universal

7 stars 5 forks source link

Package slip for archival extension #10

Open nuest opened 7 years ago

nuest commented 7 years ago

add metadata to connect which metadata file uses which schema and which contained file is the actual schema

ghost commented 7 years ago

add representation information on applied standards per ERC. include 3rd party schemas as files and referenced persistent identifier

[x] documentated in archival extension part of the erc spec

possible design:

{
    "standards_used": [{
        "name": "DataCite Metadata Schema 4.0",
        "name-short": "datacite40",
        "description": "The DataCite Metadata Schema is a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions.",
        "schema-version": "4.0",
        "schema-path-local": "erc/schema/datacite40.json ",
        "schema-url": "https://schema.datacite.org/meta/kernel-4.0/metadata.xsd",
        "schema-identifier": "doi:10.5438/0013"
    }, {
        "name": "Zenodo Metadata Schema",
        "name-short": "zenodo",
        "description": null,
        "schema-version": null,
        "schema-path-local": "erc/schema/zenodo.json ",
        "schema-url": null,
        "schema-identifier": null
    }]
}

edit: plus we might want to include descriptive text blocks to documentate used standards.

nuest commented 7 years ago

name-short - how do you imagine that to be used? Isn't it more like an (internal) identifier? I wonder if we can replace that with the schema path/url somehow, because that is already a unique identifier
schema-version - do you expect we would have to list different versions of the same standard? Imho that can be solved by either being able to put a version in name-short, or by the schema URL being versioned?
schema-identifier seems like something that should be mandatory, or if it is an external identifier, not required for us (what does it give that schema-url does not provide?). In the example, schema.url could also be http://dx.doi.org/10.5438/0013
What are the required elements here?

We also should make a JSON vs. YAML decision soon...

ghost commented 7 years ago

More like an internal identifier. maybe we will have name and description. I was trying to anticipate a future preservationist looking back at such a package slip file and finding a "speaking name" (or a description). The short name then would be something we use to reference the corresponding schema.
I think schema version are essential and we encounter different schema versions in practical usage, e.g. datacite 3.0 while harvesting and datacite 4.0 if want to include the newest features. Image datacite 8.0 being the standard in three years and our archival information saying: "it's datacite", when it is in fact datacite 3.0. it might be noted in the url.
Some schemas are published with a persistent identifier. A save strategy for identification would be to make use of this id and additionally include the schema file in the archive.
It's hard to decide. This is already a minimal draft, all elements are fit for a being a mandatory field.

I'd prefer json because some of the schemas are json, but since we also have the erc.yml, that could be more consequent.

ghost commented 6 years ago

with o2r meta 53d7a31ec5efea5e31131fbc714010e0db484fb6, the package slip is created automatically with every brokering, adding the currently used map to the package_slip.json (filename hardcoded in broker). the broker adds the Settings object of each broker map used as value under its name as key. This way xml schema information will be included and everything can be curated and validated with the translation maps.