plazi / arcadia-project

2 stars 1 forks source link

Granularity levels, verification and Quality Control #195

Open flsimoes opened 2 years ago

flsimoes commented 2 years ago

We need to define what we understand as Granularity, Verification and Quality Control, and what levels to assign to it. EDIT: QC and Verification are dropped. We'll focus on general granularity + a QCd/Not-QCd tag

Granularity

The level of processing applied to a given document. This is the level to which the batch (or an individual extraction) goes to. At the moment, processing through batch currently does not enable any partial processing, meaning it activates all the treatment, treatmentCitation, and materialsCitation macros. In order to apply the levels below we would need to implement elements that enables the template creator to signal at what point a batch process should stop.

(DEPRECATED)

Quality Control

The amount of verification and parsing Plazi applies to a given document. This QC level system is a translation of what we currently dub "Granularity levels"

Verification

The level to which a document has been fully checked by an user. The inspiration for this model is iNaturalist

flsimoes commented 2 years ago

I'm adding the specifications for each level and further descriptions

flsimoes commented 2 years ago

@myrmoteras please check it out

flsimoes commented 2 years ago

It would also be really useful to have this tags findable through the TB Stats

gsautter commented 2 years ago

It would also be really useful to have this tags findable through the TB Stats

Sure thing ... once they are defined and we start assigning them, that is ... beforehand, there's preciously little sense in adding a bunch of empty stats fields.

flsimoes commented 2 years ago

It would also be really useful to have this tags findable through the TB Stats

Sure thing ... once they are defined and we start assigning them, that is ... beforehand, there's preciously little sense in adding a bunch of empty stats fields.

Completely agree, and that's why we are here discussing them :)

myrmoteras commented 1 year ago

@flsimoes let's define the granularity levels, and let's have 3- 5 "gold standard" publications where we can show this, and which include the variation of treatments, ie

flsimoes commented 1 year ago

We're currently gathering examples

myrmoteras commented 1 year ago

example papers: article stats; treatmentStats; JSON

  1. Cipola, Nikolas G. & Katz, Aron D., 2021, Morphological and molecular analysis of Willowsia nigromaculata (Collembola, Entomobryidae, Entomobryinae) reveals a new cryptic species from the United States, European Journal of Taxonomy 739 (1), pp. 92-116 FFCEFFA71D0DD960FF886D158538FF88
  2. Silva, Ruan Felipe Da, Caron, Edilson & Carvalho-Filho, Fernando Da Silva, 2022, An update on Termitomorpha Wasman (Coleoptera: Staphylinidae) including a new species, species redescriptions and geographic extension, Zootaxa 5205 (1), pp. 1-25 https://tb.plazi.org/GgServer/summary/9F3EB341FFF2FF82B345D672FF811F58
  3. Subedi, Madan, 2022, A new genus and a new groundhopper species from Nepal (Orthoptera: Tetriginae Skejotettix netrajyoti gen. et sp. nov.), Zootaxa 5205 (1), pp. 35-54 https://tb.plazi.org/GgServer/summary/737C7525576CFFF6FFBEFF8F3E49FFC1
  4. Cutrim, Marcelo, M, Alberto, Silva-Neto, oreira da, Rafael, José Albertino, García, Alfonso Nery & Aldrete, 2022, The genus Ptiloneura Enderlein, 1901 (Psocodea, ‘ Psocoptera’, Ptiloneuridae) in the Brazilian Amazon Forest and Atlantic Forest: new species, variations in forewings and a key to the species, Zoosystema 44 (20), pp. 493-501 https://tb.plazi.org/GgServer/summary/244CDA14FFD3BB1B2820FF8C5E190A55
  5. Štepánek, Jan & Kirschner, Jan, 2013, A revision of mountain species of the genus Taraxacum F. H. Wigg. (Compositae) in Corsica, Candollea 68 (1), pp. 29-39 https://tb.plazi.org/GgServer/summary/FFD01C59FFE65F4EBF6AFFD7FFB5CA6C
flsimoes commented 1 year ago

We need to flesh out a few things, such as difference between automatic and manual bibRef parsing and level of detail of matCit parsing.

myrmoteras commented 1 year ago

@flsimoes here is the milestone "Initial scoping and assessment of optimal degree of automation for processing workflow" i n BiCIKL that deals also with granularity issues.

myrmoteras commented 1 year ago

Annotations for level 4, covering treatments, NOT tables yet.

see also Agosti et al, 2022 for further explanation of the annotations.

annotation Element name comment
treatment Treatment
subSubSection Treatment sub-sections
treatmentCitationGroup A group of treatment citations for the same taxon concept
treatmentCitation a citation of a previous treatment
taxonomicName Scientific name
taxonomicNameLabel Designator for a new or changed scientific name
materialsCitation Citation of a physical specimen
collectingCountry Country where the specimen has been collected
location location where the specimen has been collected
date Date of the collection of the specimen
specimenCode code assigned to a specimen by an institution
geoCoordinate geographic coordinates
elevation elevation of the collection of the specimen
collectorName person who collected the specimen
collectionCode code of the institution hosting the specimen
accessionCode code of the DNA sequence isolated from the specimen

Nesting of the annotations

The tags have specific positions in a treatment.

annotation sub-annotation subsub-annotation subSubSub-annoation comment
taxonomicName Scientific name; can occur in all sections, including material citation. Can occur anywhere in a treatment or article
taxonomicNameLabel Designator for a new or changed scientific name
treatment Treatment
subSubSection Treatment sub-sections
treatmentCitationGroup A group of treatment citations for the same taxon concept
treatmentCitation a citation of a previous treatment can be alone, or nested in TCG
materialsCitation Citation of a physical specimen
collectingCountry Country where the specimen has been collected
location location where the specimen has been collected
date
specimenCode
geoCoordinate
elevation
collectorName
collectionCode
accessionCode
myrmoteras commented 1 year ago

example papers: article stats; treatmentStats; JSON

@flsimoes can you please make sure that these examples are all high level and checked - so they work as examples.

flsimoes commented 1 year ago

We'll make sure of that

flsimoes commented 1 year ago

All high-level now

flsimoes commented 11 months ago

TaxPub - level-1 https://github.com/plazi/ggxml2taxpub/issues/21

flsimoes commented 11 months ago

From Patrick Ruch

"The document granularity is one of the subject of BioHackathon n26 that we are organizing; therefore I am cc-ing to Alexandre and Julien for respectively the Elixir BioHackathon, which is currently going on, and the BioC format used in SIBiLS and displayed in Pam's module.

Let's add it to the agenda of thursday !"