Open poikilotherm opened 1 year ago
One question that follows from this is:
curation = validation + sign-off
, so basically mostly the same stepAfter discussing this further with @sdruskat and @poikilotherm, we come to the conclusion that we urgently need to include @led02 here. We know what we want to do on a meta level with processing, validation, curation, humans etc, but we are not yet clear enough how we structure this to make it into executable code.
As a basis for discussion, I propose the following terminology (backed up by the naive mixed-type diagram below):
There are different perspectives:
Of those, we need to distinguish mainly between the high-level and the implementation perspective. Using the terminology proposed above can help navigate between parts and steps.
Additionally, the graphic proposes a specific modularization of steps in the implementation perspective. Again, as basis for discussion, I suggest the following steps (in simplified terms):
codemeta.authors.name: Stephan Druskat
and cff.authors.name: Stephan Druskat
)codemeta.version: 0.9.3-rc1
and cff.version: 1.0.0
).
Note: The detection of semantic conflicts is not in the scope of this step, e.g., "are John Kennedy and John F. Kennedy the same person?".
Note: Conflict recognition can be configured for this step. This does also include configuration for semantic conflicts! Examples: disambiguation of people with aliases (e.g., JFK, John Kennedy, John F. Kennedy), mail mapping, source hierarchy for a specific field or general (when there is CodeMeta and CFF, always take all values/authors from CFF)(One could map those steps straight into the implementation as extension points.)
To make sure everyone of us and our users is on the same page what happens where, let's make sure we document properly what is meant to happen in any step.
Example: the process/validate label for a (combined) step for processing data into a unified data model, and the validation of that status. In this case, "process" is not about validation of semantics or syntax (which might need a human in the loop), but instead about consistency of metadata. "validate", however, is exactly about the semantic conflicts within the unified data model (and needs that human).
Example: "curation" vs. "conflict resolution" - we have talked about conflict resolution in the past, which actually is "validation" as in the example above. "curation" is the step of "signing off" on a potential deposit, and may or may not include some part of validation, and additional validation, e.g., as described in #68.
There might be other issues with our implicit definition of steps which we should be more explicit about.