Open cnrdh opened 2 years ago
Transforms need a little helping hand, like below, or by adding ["bottle_no","fieldNumber"]
to iopanDwcOccurrenceTuples
.
~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv \
| ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number \
| ndjson-map 'd.fieldNumber = d.bottle_no,delete d.bottle_no,delete d.Class_phylum, d' \
| ./bin/ndjson-transform --tsv | ./bin/dwc-occurrence-csv-transform
log_no ?
3348 ""
14 159
32 162
16 163
13 164
20 439
size classes (µm):
140 ""
1 "10-20"
1 20
1 "20-30"
1 "30-40"
3 40
1 60
2 70
1 "70;160"
1 80
Regular: 138 uniq fieldNumbers, 12 not found
ndjson-join --left d.fieldNumber <( cat data/deposit/iopan/protist-biodiversity//konghau_database_completeICE10.csv| ./bin/dwc-occurrence-csv-transform ) <( cat data/deposit/2010/ICE2010/ice_2010_sampling-events.tsv | ./bin/dwc-sampling-event-csv-transform ) | ndjson-filter 'd[1]===null' | ndjson-map 'd=d[0],[d.expedition,d.locationID,d.maximumDepthInMeters,d.minimumDepthInMeters,d.fieldNumber]' | sort | uniq -c
27 ["ICE2010","ICE10-16",0,null,"ICE10-379"]
15 ["ICE2010","ICE10-16",100,100,"ICE10-384"]
38 ["ICE2010","ICE10-16",10,10,"ICE10-381"]
46 ["ICE2010","ICE10-16",35,35,"ICE10-382"]
36 ["ICE2010","ICE10-16",50,50,"ICE10-380"]
21 ["ICE2010","ICE10-16",50,50,"ICE10-383"]
34 ["ICE2010","R4",0,null,"ICE10-152"]
13 ["ICE2010","R4",100,100,"ICE10-157"]
32 ["ICE2010","R4",25,25,"ICE10-155"]
13 ["ICE2010","R4",38,38,"ICE10-158"]
16 ["ICE2010","R4",50,50,"ICE10-156"]
18 ["ICE2010","R6b",5,5,"ICE10-253"]
Investigate: Why incalculable? => missing bottle volume in input, but there is a "cells in 250 ml" column, only used for 614 microplankton (32L initial volume, filtered)
cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number | ndjson-map '[d.Vth_filtered_L,d.cells_in_250_ml]' | ndjson-filter 'd[0]!=1 && d[1]!=0' | ndjson-map d[0] | sort | uniq -c
614 32
[Only 570 has 32L in "total database", with 595 marked as micro :/]
Handnet, fails JSON schema validation for
"scientificName":"cysta chrysophyta"
"scientificName":"cysta chaetoceros (simplex)"
{"name":"ICE10","depth":"20-0","station":"R7b","no":"321","data":"21.08.2010","Class/phylum":"Chrysophyceae","takson":"cysta chrysophyta","size class (μm)":""}
{"name":"ICE10","depth":"20-0","station":"R7b","no":"321","data":"21.08.2010","Class/phylum":"Diatomeae","takson":"cysta chaetoceros (simplex)","size class (μm)":""}
{"name":"ICE10","depth":"20-0","station":"R9b","no":"475","data":"22.08.2010","Class/phylum":"Chrysophyceae","takson":"cysta chrysophyta","size class (μm)":""}
Samplelog contains 112 phytoplankton 18 microplankton
Data has no microplankton marker, except use of Vth_filtered_L
:
42 uniq bottle_no
has filtered volume > 1L (microplankton?)
96 has Vth_filtered_L === 1
~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.tsv | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number | ndjson-filter 'd.Vth_filtered_L>1' | ndjson-map d.bottle_no | sort | uniq -c | wc -l
42
About non-match, there's ABC samples,
$ cat $samples | grep 2010 | grep -E 'ICE10-15[56789][A]' | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d' | sort | uniq
$ cat $samples | grep 2010 | grep -E "ICE10-38[234][A]" | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d'
cat $samples | grep 2010 | grep -E "ICE10-(379|38[01])[A]" | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d'