Open martinjoconnor opened 8 years ago
Look at FAIR principles:
From "The FAIR Guiding Principles for scientific data management and stewardship": http://www.nature.com/articles/sdata201618
Assigned myself for now, since Martin and I agreed I'd create a specification from the Product Owner perspective.
Have reviewed the referenced P&V document Martin created, added comments. I will create an actual list of suggested fields 'soon'.
metadatacenter/cedar-template-editor#66
metadatacenter/cedar-template-editor#84
Tentative terminology, need to review against Martin's document: I'm using provenance to mean information about the resource and where it came from. I'm using versioning to mean just the specific bit of provenance that identifies and relates updates of the same resource. (So in this model, a template, element, or instance resource has multiple versions.)
Example: If I create a template, then I go back and edit that template later, I have created a new version of that template. The version information needs to be able to uniquely identify the different version, describe when it was created, relate it to the last version.
On the other hand, if I make a copy of the template, the provenance of the new template needs to reflect that it was derived from the original template (and which version of that template was used for the copy). The new template is starting with a new version sequence of its own, unrelated to the version sequence in the original.
This distinction, which is very un-git-like, is necessary to give users a model for tracking changes within and across templates. (Git's equivalent is to adopt a Workflow or similar model. We are not going to put that abstraction in front of users.)
I've added a healthy version and derivation section to the end of https://docs.google.com/document/d/1a2lvGJxD-UbLcdhs7zFR935y0lqIUubR4dfaev5lnSk/, which makes the argument for minimum version and derivation tracking in the model for 1.0, but no UI or other features based on it (except using the right template version to present a corresponding instance).
All that's left is to define the actual attributes. So close now.
more advanced/subtle aspects of versioning, like the possibility that multiple versions can have the same parent (A -> B and A -> C), and more detailed version relationships, is covered in the OOI CI concepts WP. Only I should read that document though, it needs to be trimmed way down to be useful.
As discussed in the versioning task (metadatacenter/cedar-project#57), we will henceforth disentangle provenance from versioning.
The use of a graph node here
http://www.bbc.co.uk/ontologies/provenance
is interesting. Instead of throwing an ever-expanding set of provenance fields into the model we can reify - and attach the provenance fields to wrapper entities.
This general approach would allow us to have different sets of provenance properties for different entities driven by user preferences. We could even store these provenance objects as CEDAR elements. These would be meta-metadata elements!
The "The health care and life sciences community profile for dataset descriptions" paper is of relevance here: https://peerj.com/articles/2331/
Interesting email exchange initiated by a conversation with Clement re finding a generic hasProvenance
property to link cedar resources to provenance objects.
De : Clement Jonquet [mailto:jonquet@lirmm.fr]
Envoyé : mercredi 21 septembre 2016 14:05
À : 'Paolo Ciccarese' <paolo.ciccarese@gmail.com>
Objet : Question about reifying provenance object
Hi Paolo,
I hope you’re doing well.
I have a question for you about provenance.
Some of my CEDAR project colleagues developing an application for metadata management, want to describe provenance about an object by putting all the provenance information into a class (certainly a prov:Entity or prov:Activity) “attached” to the object.
We could say they want to reify the provenance information into another class rather than in the object itself.
Let’s say they want to say something about O.
Option 1 would be:
O isoftype Something
O isoftype Entity
O prov:wasInformedBy X
Option 2 (the one they prefer) will be:
O isoftype Something
P isoftype Entity
P prov:wasInformedBy X
O hasProvenance P
Then the question is: do you know a standard vocabulary/property for hasProvenance ??
I hope this is clear ;)
Thanks
Clement
Dr. Clement JONQUET - PhD in Informatics - Assistant Professor
University of Montpellier
Coordinator of the SIFR and AgroPortal projects
Visiting scholar, Stanford University (EU Marie Curie fellow)
jonquet@lirmm.fr
http://www.lirmm.fr/~jonquet
@Montpellier : +33/4 67 14 97 43
@Stanford : +1 650 723 6725
--
Dr. Paolo Ciccarese
Principal Knowledge and Software Engineer at PerkinElmer Innovation Lab
Assistant Professor of Neurology at Harvard Medical School
Assistant in Neuroscience at Mass General Hospital
ORCID: http://orcid.org/0000-0002-5156-2703
Also of relevance is the BBC provenance ontology, in particular the provenance:Graph
class:
Initial list from John here:
Key curated information about metadata/provenance performed b y Clement's team for ontologies (not so very different). See in particular the "minimal" list.
Of these, the following are definitively not relevant for us, based on a superficial review (see next comment for most valuable among the remaining):
My list of most valuable concepts from the above, some of which could be/should be/are expressed using different concepts. These are things one might bring forward in a metadata UI, to prompt users to capture the corresponding information.
We need to add basic versioning and provenance information to the template model. By just tracking a few basic things in a robust way (e.g., PROV-O, PAV compatible fields/properties), we can maintain the basic information needed for more advanced version access later.
This should be treated as the master task, with subtasks to implement UIs to see and possibly manage the information.
See: https://docs.google.com/document/d/1a2lvGJxD-UbLcdhs7zFR935y0lqIUubR4dfaev5lnSk/