Closed cnrdh closed 5 years ago
Complete interpretation for review (N=471), the trailing digits are the number of unique records with this interpretation:
$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv | ./bin/dwc-occurrence-csv-transform | ndjson-filter 'd.verbatimScientificName' | ndjson-map 're=new RegExp(d.scientificName), { verbatimScientificName: d.verbatimScientificName,scientificName: d.scientificName, scientificNameMatchesVerbatimName: re.test(d.verbatimScientificName) }' | sort | uniq | grep false | ./bin/ndjson-transform --transform --tsv
"verbatimScientificName" "scientificName" "scientificNameMatchesVerbatimName"
"Biflagellatae" "Eukaryota incertae sedis" false
"Cell" "Eukaryota incertae sedis" false
"Centriceae" "Bacillariophyceae" false
"Ceratium longipes" "Tripos longipes" false
"Cyst 10-20um" "Eukaryota" false
"Cyst" "Eukaryota" false
"Cyst sp. 2 pestka" "Eukaryota" false
"Cyst sp.2 pestka" "Eukaryota" false
"Cyst sp.4" "Eukaryota" false
"Cyst sp.6" "Eukaryota" false
"Dinoflagellatae" "Dinophyceae" false
"Flagellatae" "Eukaryota incertae sedis" false
"Flagellatae (sercowate z kilkoma witkami)" "Eukaryota incertae sedis" false
"Gymnodinium ovatum" "Gyrodinium ovatum" false
"Heterokontophyta" "Ochrophyta" false
"Indeterm spores" "Eukaryota" false
"Monoflagellatae" "Eukaryota incertae sedis" false
"Navicula gelida var. Radissonii" "Navicula gelida var. radissonii" false
"Neoceratium arcticum" "Ceratium arcticum" false
"Pennate" "Bacillariophyceae" false
"Pleurochrysis carterae" "Chrysotila carterae" false
"Prasinophyceae/Meringosphaera" "Eukaryota incertae sedis" false
"Preperidinium meunieri" "Preperidinium meunierii" false
"Rhizosolenia hebetata f. semispina" "Rhizosolenia semispina" false
"Scuticociliata" "Scuticociliatia" false
"Scuticociliatida/Oligohymenophorea" "Scuticociliatia" false
"Strombididae" "Strombidiidae" false
"Thalassiosira bioculata var.exigua" "Thalassiosira bioculata var. exigua" false
"Thecate dinoflagellatae" "Dinophyceae" false
"Thecate dinophyceae" "Dinophyceae" false
"Tintinnus inquilinum" "Eutintinnus apertus" false
"Unidentified cells 10-20um" "Eukaryota" false
"Unidentified cells 20-30um" "Eukaryota" false
"Unidentified cells 5-10um" "Eukaryota" false
"Unidentified cysts" "Eukaryota" false
Changes reviewed by AT
These are all the unknown names:
Unknown (37): [ 'Algiosphera robusta/ Phaeocystis?',
'Athecate Dinophyceae non det.',
'Cell non det.',
'Cell non det. 7',
'Centriceae non det.',
'Chaetoceros convolutus /concavicornis',
'Chaetoceros concavicornis/convolutus',
'Chlorophyceae non det.',
'Choanoflagellida non det.',
'Chrysophyceae non det.',
'Ciliophora non det',
'Co to',
'Cos',
'Cyst',
'Dinophyceae non det.',
'Flagellatae non det.',
'Fragilariopsis cylindus/ oceanica',
'Fragilariopsis oceanica/ cylindrus',
'Gymno/Gyrodinium',
'Gymnodinium/Gyrodinium',
'Lessardia elongata/Amphidoma acuminata',
'Pennate non det.',
'Prasinophyceae non det.',
'Rhizosolenia hebetata f.hebetata',
'Rhizosolenia hebetata f.semispina',
'Scuticociliata non det.',
'Scuticociliatida/ Oligohymenophorea',
'Spirotrichea non det.',
'Spora non det.',
'Strombididae non det.',
'Thalassiosira bioculata var.exigua',
'Thalassiosira gravida/pacifica',
'Thecate dinoflagellatae non det.',
'Tintinnidae non det.',
'Tintinnopsis aglutinated',
'Unidentified cells',
'Unidentified cysts' ]
$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv | ./bin/dwc-occurrence-csv-transform | ndjson-filter 'd.errors' | ndjson-split 'd.errors' | sort | uniq -c
23 {"value":"20 chlla max","dataPath":".maximumDepthInMeters","message":"should be number,null"}
3 {"value":321.67,"dataPath":".maxFields","message":"should be equal to one of the allowed values"}
2 {"value":"#DIV/0!","dataPath":".individualCount","message":"should be number,null"}
2 {"value":"#DIV/0!","dataPath":".organismQuantity","message":"should be number,null"}
147 {"value":"#VERDI!","dataPath":".organismQuantity","message":"should be number,null"}
The 2009-2013 protist biodiversity contains 14954 records, and ~ 352 unique scientific names (~ since there are whitespace and two fields are used;
Taxon_full
contains 619 unique strings).Of the original names, 308 were known and 37 were unknown.