npolar / marine-db

https://doi.org/10.21334/marine-db
0 stars 0 forks source link

Convert IOPAN protist data from ICE2010 into Darwin Core #51

Open cnrdh opened 2 years ago

cnrdh commented 2 years ago

$ wc -l data/deposit/iopan/protist-biodiversity/*ICE10*
  3444 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv
   153 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10-handnet.csv
  3597 total
cnrdh commented 2 years ago

Transforms need a little helping hand, like below, or by adding ["bottle_no","fieldNumber"] to iopanDwcOccurrenceTuples.

~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv \
  | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number \
  | ndjson-map 'd.fieldNumber = d.bottle_no,delete d.bottle_no,delete d.Class_phylum, d' \
  | ./bin/ndjson-transform --tsv | ./bin/dwc-occurrence-csv-transform

log_no ?

   3348 ""
     14 159
     32 162
     16 163
     13 164
     20 439

size classes (µm):


    140 ""
      1 "10-20"
      1 20
      1 "20-30"
      1 "30-40"
      3 40
      1 60
      2 70
      1 "70;160"
      1 80
cnrdh commented 2 years ago

Regular: 138 uniq fieldNumbers, 12 not found

ndjson-join --left d.fieldNumber <( cat data/deposit/iopan/protist-biodiversity//konghau_database_completeICE10.csv| ./bin/dwc-occurrence-csv-transform ) <( cat data/deposit/2010/ICE2010/ice_2010_sampling-events.tsv | ./bin/dwc-sampling-event-csv-transform ) | ndjson-filter 'd[1]===null' | ndjson-map 'd=d[0],[d.expedition,d.locationID,d.maximumDepthInMeters,d.minimumDepthInMeters,d.fieldNumber]' | sort | uniq -c
     27 ["ICE2010","ICE10-16",0,null,"ICE10-379"]
     15 ["ICE2010","ICE10-16",100,100,"ICE10-384"]
     38 ["ICE2010","ICE10-16",10,10,"ICE10-381"]
     46 ["ICE2010","ICE10-16",35,35,"ICE10-382"]
     36 ["ICE2010","ICE10-16",50,50,"ICE10-380"]
     21 ["ICE2010","ICE10-16",50,50,"ICE10-383"]
     34 ["ICE2010","R4",0,null,"ICE10-152"]
     13 ["ICE2010","R4",100,100,"ICE10-157"]
     32 ["ICE2010","R4",25,25,"ICE10-155"]
     13 ["ICE2010","R4",38,38,"ICE10-158"]
     16 ["ICE2010","R4",50,50,"ICE10-156"]
     18 ["ICE2010","R6b",5,5,"ICE10-253"]

Investigate: Why incalculable? => missing bottle volume in input, but there is a "cells in 250 ml" column, only used for 614 microplankton (32L initial volume, filtered)

cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv  | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number | ndjson-map '[d.Vth_filtered_L,d.cells_in_250_ml]' | ndjson-filter 'd[0]!=1 && d[1]!=0' | ndjson-map d[0] | sort | uniq -c

    614 32

[Only 570 has 32L in "total database", with 595 marked as micro :/]

cnrdh commented 2 years ago

Handnet, fails JSON schema validation for


"scientificName":"cysta chrysophyta"
"scientificName":"cysta chaetoceros (simplex)"
{"name":"ICE10","depth":"20-0","station":"R7b","no":"321","data":"21.08.2010","Class/phylum":"Chrysophyceae","takson":"cysta chrysophyta","size class (μm)":""}
{"name":"ICE10","depth":"20-0","station":"R7b","no":"321","data":"21.08.2010","Class/phylum":"Diatomeae","takson":"cysta chaetoceros (simplex)","size class (μm)":""}
{"name":"ICE10","depth":"20-0","station":"R9b","no":"475","data":"22.08.2010","Class/phylum":"Chrysophyceae","takson":"cysta chrysophyta","size class (μm)":""}
cnrdh commented 2 years ago

Samplelog contains 112 phytoplankton 18 microplankton

Data has no microplankton marker, except use of Vth_filtered_L: 42 uniq bottle_no has filtered volume > 1L (microplankton?) 96 has Vth_filtered_L === 1


~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.tsv   | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number | ndjson-filter 'd.Vth_filtered_L>1' | ndjson-map d.bottle_no | sort | uniq -c | wc -l
42
cnrdh commented 2 years ago

About non-match, there's ABC samples,


$ cat $samples | grep 2010 | grep -E 'ICE10-15[56789][A]' | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d' | sort | uniq 
$ cat $samples | grep 2010 | grep -E "ICE10-38[234][A]" | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d'
cat $samples | grep 2010 | grep -E "ICE10-(379|38[01])[A]" | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d'