Add phyloreference state so we can see if they are incomplete, unvalidated or validated

gaurav commented 6 years ago

The phyloreference curation process should probably look like this:

Create new, blank phyloreference [state: Incomplete]
Add name and clade definition (see #18 for an attempt to parse the clade definition directly into specifiers).
Annotate note where this phyloreference is expected to resolve.
Add specifiers.
Mark phyloreference as complete [state: Unvalidated]
Start reasoning to test resolution. State will either change to [state: Failed Validation] or [state: Validated].

This is the basis of colour-coding phyloreferences: incomplete (clear?) and validated (green) phyloreferences are clear, while unvalidated can be further subdivided into "unvalidated but all specifiers match" (yellow) and "unvalidated but some specifiers don't match" (red).

Furthermore:

[ ] Phyloreferences in the "draft" state that resolve as expected can automatically be marked as "final-draft"
[ ] Phyloreferences that are marked as "published" should include a citation to their publication

hlapp commented 6 years ago

Mark phyloreference as complete [state: Unvalidated]

Start reasoning to test resolution. State will either remain unvalidated or change to Validated.

Isn't there a difference between a phyloreference that hasn't been validated yet ("unvalidated"), and one that has been but failed to validate?

Or do you anticipate that Phyloreferences cannot be in "unvalidated" state because they haven't been validated yet?

gaurav commented 6 years ago

That is what I anticipated, but I think you're right: it'd be useful to highlight phyloreferences that validated incorrectly (and which we might want to treat as a "to do" that we would like to fix eventually) and those that have never been validated at all -- especially initially, when validation might be very slow. I've updated the issue!

gaurav commented 6 years ago

Remember that a phyloreference could remain in "Unvalidated" even after testing, for example if the authors did not annotate where they expected a clade definition to resolve on any phylogeny. That seems unlikely, but I'm sure it'll turn up.

gaurav commented 6 years ago

Note that [state: Failed Validation] is necessary to the Clade Ontology test suite -- it allows us to mark phyloreferences that we don't expect to currently resolve, either because of technical limitations (such as phyloref/clade-ontology#27) or because of some ambiguity or error in the phyloreference statement itself. In the Clade Ontology, this is being tracked as phyloref/clade-ontology#31.

gaurav commented 6 years ago

We could use the Publication Status Ontology to give each phyloreference a publication status.

All phyloreferences start at pso:draft.
Once the curator considers them complete, they hit a "Complete" button which changes its status to pso:final-draft.
Once the phyloreference has been tested, its publication status changes to either pso:under-review (indicating that tests failed) or pso:submitted (indicating that the tests passed).
Once the phyloreferences have been published as an OWL ontology, their publication status will be changed to pso:published.
If a phyloreference is ever deprecated, we can mark it as pso:retracted-from-publication.
We will not track corrections of previous definitions using PSO ontology terms, i.e. if Clade Ontology 0.2 uses a particular definition of Mammalia but Clade Ontology 0.3 uses a different definition, both phyloreferences will have a status of pso:published, not pso:corrected.

Since this uses a time-indexed value with context, we could also document the full history of a particular phyloreference changing over time and which agent was responsible for each change in status! One potential downside is that this may be confusing with the publication status of a single PHYX file or the entire Clade Ontology, but given that it is the Phyloreference has will be pso:with status, I think it'll be okay.

We could instead use the Evaluation and Report Language to describe the result of testing these phyloreferences by various means (human curation, automated testing, etc.) as passing or failing. However, I think the Publication Status Ontology is just about perfect for our needs!

gaurav commented 1 year ago

Could be relevant for someone maintaining an ontology of clade definitions, but isn't unique to phylogenetic definitions. You could imagine having different source files in different dictionaries representing different states. So, not a priority for Klados v1.0.

gaurav commented 1 year ago

For future reference, we could use the curation status specification that is part of the IAO rather than using the Publication Status Ontology.

phyloref / klados

Add phyloreference state so we can see if they are incomplete, unvalidated or validated #25