npolar / marine-db

https://doi.org/10.21334/marine-db
0 stars 0 forks source link

Protist taxonomy interpretation 2009-2013 #35

Closed cnrdh closed 5 years ago

cnrdh commented 5 years ago

The 2009-2013 protist biodiversity contains 14954 records, and ~ 352 unique scientific names (~ since there are whitespace and two fields are used; Taxon_full contains 619 unique strings).

Of the original names, 308 were known and 37 were unknown.

cnrdh commented 5 years ago

Complete interpretation for review (N=471), the trailing digits are the number of unique records with this interpretation:

$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv | ./bin/dwc-occurrence-csv-transform | ndjson-filter 'd.verbatimScientificName' | ndjson-map 're=new RegExp(d.scientificName), { verbatimScientificName: d.verbatimScientificName,scientificName: d.scientificName, scientificNameMatchesVerbatimName: re.test(d.verbatimScientificName) }' | sort | uniq | grep false | ./bin/ndjson-transform --transform --tsv
"verbatimScientificName"    "scientificName"    "scientificNameMatchesVerbatimName"
"Biflagellatae" "Eukaryota incertae sedis"  false
"Cell"  "Eukaryota incertae sedis"  false
"Centriceae"    "Bacillariophyceae" false
"Ceratium longipes" "Tripos longipes"   false
"Cyst 10-20um"  "Eukaryota" false
"Cyst"  "Eukaryota" false
"Cyst sp. 2 pestka" "Eukaryota" false
"Cyst sp.2 pestka"  "Eukaryota" false
"Cyst sp.4" "Eukaryota" false
"Cyst sp.6" "Eukaryota" false
"Dinoflagellatae"   "Dinophyceae"   false
"Flagellatae"   "Eukaryota incertae sedis"  false
"Flagellatae (sercowate z kilkoma witkami)" "Eukaryota incertae sedis"  false
"Gymnodinium ovatum"    "Gyrodinium ovatum" false
"Heterokontophyta"  "Ochrophyta"    false
"Indeterm spores"   "Eukaryota" false
"Monoflagellatae"   "Eukaryota incertae sedis"  false
"Navicula gelida var. Radissonii"   "Navicula gelida var. radissonii"   false
"Neoceratium arcticum"  "Ceratium arcticum" false
"Pennate"   "Bacillariophyceae" false
"Pleurochrysis carterae"    "Chrysotila carterae"   false
"Prasinophyceae/Meringosphaera" "Eukaryota incertae sedis"  false
"Preperidinium meunieri"    "Preperidinium meunierii"   false
"Rhizosolenia hebetata f. semispina"    "Rhizosolenia semispina"    false
"Scuticociliata"    "Scuticociliatia"   false
"Scuticociliatida/Oligohymenophorea"    "Scuticociliatia"   false
"Strombididae"  "Strombidiidae" false
"Thalassiosira bioculata var.exigua"    "Thalassiosira bioculata var. exigua"   false
"Thecate dinoflagellatae"   "Dinophyceae"   false
"Thecate dinophyceae"   "Dinophyceae"   false
"Tintinnus inquilinum"  "Eutintinnus apertus"   false
"Unidentified cells 10-20um"    "Eukaryota" false
"Unidentified cells 20-30um"    "Eukaryota" false
"Unidentified cells 5-10um" "Eukaryota" false
"Unidentified cysts"    "Eukaryota" false
cnrdh commented 5 years ago

Changes reviewed by AT

cnrdh commented 5 years ago

These are all the unknown names:


Unknown (37): [ 'Algiosphera robusta/ Phaeocystis?',
  'Athecate Dinophyceae non det.',
  'Cell non det.',
  'Cell non det. 7',
  'Centriceae non det.',
  'Chaetoceros  convolutus /concavicornis',
  'Chaetoceros concavicornis/convolutus',
  'Chlorophyceae non det.',
  'Choanoflagellida non det.',
  'Chrysophyceae non det.',
  'Ciliophora non det',
  'Co to',
  'Cos',
  'Cyst',
  'Dinophyceae non det.',
  'Flagellatae non det.',
  'Fragilariopsis cylindus/ oceanica',
  'Fragilariopsis oceanica/ cylindrus',
  'Gymno/Gyrodinium',
  'Gymnodinium/Gyrodinium',
  'Lessardia elongata/Amphidoma acuminata',
  'Pennate non det.',
  'Prasinophyceae non det.',
  'Rhizosolenia hebetata f.hebetata',
  'Rhizosolenia hebetata f.semispina',
  'Scuticociliata non det.',
  'Scuticociliatida/  Oligohymenophorea',
  'Spirotrichea non det.',
  'Spora non det.',
  'Strombididae non det.',
  'Thalassiosira bioculata var.exigua',
  'Thalassiosira gravida/pacifica',
  'Thecate dinoflagellatae non det.',
  'Tintinnidae non det.',
  'Tintinnopsis aglutinated',
  'Unidentified cells',
  'Unidentified cysts' ]
cnrdh commented 5 years ago

$ cat data/deposit/iopan/protist-biodiversity/total_database_npi2009-2013.tsv | ./bin/dwc-occurrence-csv-transform | ndjson-filter 'd.errors' | ndjson-split 'd.errors' | sort | uniq -c
     23 {"value":"20 chlla max","dataPath":".maximumDepthInMeters","message":"should be number,null"}
      3 {"value":321.67,"dataPath":".maxFields","message":"should be equal to one of the allowed values"}
      2 {"value":"#DIV/0!","dataPath":".individualCount","message":"should be number,null"}
      2 {"value":"#DIV/0!","dataPath":".organismQuantity","message":"should be number,null"}
    147 {"value":"#VERDI!","dataPath":".organismQuantity","message":"should be number,null"}