metadatacenter / cedar-artifact-server

Backend services to support operations on Metadata Templates
Other
1 stars 2 forks source link

Provenance requirements #6

Open martinjoconnor opened 8 years ago

martinjoconnor commented 8 years ago

We need to add basic versioning and provenance information to the template model. By just tracking a few basic things in a robust way (e.g., PROV-O, PAV compatible fields/properties), we can maintain the basic information needed for more advanced version access later.

This should be treated as the master task, with subtasks to implement UIs to see and possibly manage the information.

See: https://docs.google.com/document/d/1a2lvGJxD-UbLcdhs7zFR935y0lqIUubR4dfaev5lnSk/

martinjoconnor commented 8 years ago

Look at FAIR principles:

screen shot 2016-03-15 at 9 42 15 am

From "The FAIR Guiding Principles for scientific data management and stewardship": http://www.nature.com/articles/sdata201618

graybeal commented 8 years ago

Assigned myself for now, since Martin and I agreed I'd create a specification from the Product Owner perspective.

Have reviewed the referenced P&V document Martin created, added comments. I will create an actual list of suggested fields 'soon'.

graybeal commented 8 years ago

metadatacenter/cedar-template-editor#66

graybeal commented 8 years ago

metadatacenter/cedar-template-editor#84

graybeal commented 8 years ago

Tentative terminology, need to review against Martin's document: I'm using provenance to mean information about the resource and where it came from. I'm using versioning to mean just the specific bit of provenance that identifies and relates updates of the same resource. (So in this model, a template, element, or instance resource has multiple versions.)

Example: If I create a template, then I go back and edit that template later, I have created a new version of that template. The version information needs to be able to uniquely identify the different version, describe when it was created, relate it to the last version.

On the other hand, if I make a copy of the template, the provenance of the new template needs to reflect that it was derived from the original template (and which version of that template was used for the copy). The new template is starting with a new version sequence of its own, unrelated to the version sequence in the original.

This distinction, which is very un-git-like, is necessary to give users a model for tracking changes within and across templates. (Git's equivalent is to adopt a Workflow or similar model. We are not going to put that abstraction in front of users.)

graybeal commented 8 years ago

I've added a healthy version and derivation section to the end of https://docs.google.com/document/d/1a2lvGJxD-UbLcdhs7zFR935y0lqIUubR4dfaev5lnSk/, which makes the argument for minimum version and derivation tracking in the model for 1.0, but no UI or other features based on it (except using the right template version to present a corresponding instance).

All that's left is to define the actual attributes. So close now.

graybeal commented 8 years ago

more advanced/subtle aspects of versioning, like the possibility that multiple versions can have the same parent (A -> B and A -> C), and more detailed version relationships, is covered in the OOI CI concepts WP. Only I should read that document though, it needs to be trimmed way down to be useful.

martinjoconnor commented 8 years ago

As discussed in the versioning task (metadatacenter/cedar-project#57), we will henceforth disentangle provenance from versioning.

martinjoconnor commented 8 years ago

The use of a graph node here

http://www.bbc.co.uk/ontologies/provenance

is interesting. Instead of throwing an ever-expanding set of provenance fields into the model we can reify - and attach the provenance fields to wrapper entities.

This general approach would allow us to have different sets of provenance properties for different entities driven by user preferences. We could even store these provenance objects as CEDAR elements. These would be meta-metadata elements!

martinjoconnor commented 8 years ago

The "The health care and life sciences community profile for dataset descriptions" paper is of relevance here: https://peerj.com/articles/2331/

martinjoconnor commented 8 years ago

Interesting email exchange initiated by a conversation with Clement re finding a generic hasProvenance property to link cedar resources to provenance objects.

De : Clement Jonquet [mailto:jonquet@lirmm.fr] 
Envoyé : mercredi 21 septembre 2016 14:05
À : 'Paolo Ciccarese' <paolo.ciccarese@gmail.com>
Objet : Question about reifying provenance object

Hi Paolo,
I hope you’re doing well.

I have a question for you about provenance.
Some of my CEDAR project colleagues developing an application for metadata management, want to describe provenance about an object by putting all the provenance information into a class (certainly a prov:Entity or prov:Activity) “attached” to the object.
We could say they want to reify the provenance information into another class rather than in the object itself.

Let’s say they want to say something about O.
Option 1 would be:
O isoftype Something
O isoftype Entity
O prov:wasInformedBy X

Option 2 (the one they prefer) will be:
O isoftype Something
P isoftype Entity
P prov:wasInformedBy X
O hasProvenance P

Then the question is: do you know a standard vocabulary/property for hasProvenance ??

I hope this is clear ;)

Thanks
Clement

Dr. Clement JONQUET  -  PhD in Informatics  -  Assistant Professor
University of Montpellier
Coordinator of the SIFR and AgroPortal projects
Visiting scholar, Stanford University (EU Marie Curie fellow)

jonquet@lirmm.fr
http://www.lirmm.fr/~jonquet

@Montpellier : +33/4 67 14 97 43
@Stanford       : +1 650 723 6725

--
Dr. Paolo Ciccarese                     
Principal Knowledge and Software Engineer at PerkinElmer Innovation Lab
Assistant Professor of Neurology at Harvard Medical School                      
Assistant in Neuroscience at Mass General Hospital 
ORCID: http://orcid.org/0000-0002-5156-2703
martinjoconnor commented 8 years ago

Also of relevance is the BBC provenance ontology, in particular the provenance:Graph class:

http://www.bbc.co.uk/ontologies/provenance

martinjoconnor commented 7 years ago

Initial list from John here:

https://docs.google.com/spreadsheets/d/1CJ26ZiWjJZJWnqLW4azwUhNk7_8bCuxxBAmepV10zSo/edit?ts=58b5feba#gid=1780001154

graybeal commented 4 years ago

Key curated information about metadata/provenance performed b y Clement's team for ontologies (not so very different). See in particular the "minimal" list.

Of these, the following are definitively not relevant for us, based on a superficial review (see next comment for most valuable among the remaining):

graybeal commented 4 years ago

My list of most valuable concepts from the above, some of which could be/should be/are expressed using different concepts. These are things one might bring forward in a metadata UI, to prompt users to capture the corresponding information.