npolar / marine-db

https://doi.org/10.21334/marine-db
0 stars 0 forks source link

Non-unique sample identifiers #3

Open cnrdh opened 6 years ago

cnrdh commented 6 years ago

About 130 non-unique sample names (2014-2017) See also issues #4, #5 (2014,2016,2017) and #6 (2015)

cnrdh commented 6 years ago
cat data/master/sample/*2017*.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
     37 "Sal_Temp_pH"
      8 "undefined"
      4 "KpN4_R1_10M; KpN4_R2_10M"
      2 "R6_R1_S; R6_R2_S"
      2 "R6_R1_M; R6_R2_M"
      2 "R6_R1_B; R6_R2_B"
      2 "MOSJ2017/DOX-009"
      2 "MOSJ2017/DOX-008"
      2 "MOSJ2017/DOX-007"
      2 "MOSJ2017/DIC-091"
      2 "GlacierFront2017/SAL-386"
      2 "GlacierFront2017/PHT-033"
      2 "GlacierFront2017/OXY-161"
      2 "GlacierFront2017/OXY-160"
      2 "GlacierFront2017/OXY-021"
      2 "GlacierFront2017/DIC-079"
      2 "GlacierFront2017/DIC-078"
      2 "GlacierFront2017/DIC-077"
      2 "GlacierFront2017/DIC-076"
      2 "GlacierFront2017/DIC-033"
      2 "GlacierFront2017/DIC-031"
cat data/master/sample/*2016*.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
      6 "undefined"
      2 "MOSJ2016/ZOT-063"
      2 "MOSJ2016/ZOT-062"
      2 "MOSJ2016/ZOT-061"
      2 "MOSJ2016/ZOT-018"
      2 "MOSJ2016/ZOT-017"
      2 "MOSJ2016/ZOT-016"
      2 "MOSJ2016/ZOT-015"
      2 "MOSJ2016/ZOT-014"
      2 "MOSJ2016/PAB-070"
      2 "MOSJ2016/MAA-045"
      2 "MOSJ2016/MIT-015"
      2 "MOSJ2016/CDO-054"
      2 "GlacierFront2016/NUT-243"
      2 "GlacierFront2016/NUT-242"
cnrdh commented 6 years ago

$ cat data/master/sample/2014.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn 5 "undefined" 2 "MOSJ2014/FCM-068" 2 "MOSJ2014/FCM-067" 2 "MOSJ2014/FCM-066" 2 "MOSJ2014/FCM-065" 2 "MOSJ2014/FCM-064" 2 "MOSJ2014/FCM-063" 2 "MOSJ2014/FCM-062" 2 "ICE2014/ZOT-076"

cnrdh commented 6 years ago

$ cat data/master/sample/*2015*.ndjson | ndjson-filter '!d.sample.match(/\/(DIC|BAR|GAS|FCM|SAL|OX[IY])/)' | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn 
      2 "On-ice CTD-047"
      2 "N-ICE2015/SWN-068"
      2 "N-ICE2015/POC-530"
      2 "N-ICE2015/NUT-915"
      2 "N-ICE2015/NUT-914"
      2 "N-ICE2015/NUT-913"
      2 "N-ICE2015/NUT-912"
      2 "N-ICE2015/NUT-699"
      2 "N-ICE2015/NUT-698"
      2 "N-ICE2015/NUT-697"
      2 "N-ICE2015/NUT-670"
      2 "N-ICE2015/NUT-289"
      2 "N-ICE2015/NUT-288"
      2 "N-ICE2015/NUT-286"
      2 "N-ICE2015/IAT-255"
      2 "N-ICE2015/IAT-230"
      2 "N-ICE2015/IAT-229"
      2 "N-ICE2015/FCF-109"
      2 "N-ICE2015/DOX-232"
      2 "N-ICE2015/DOC-200"
      2 "N-ICE2015/DOC-004"
      2 "N-ICE2015/CHL-872"
      2 "N-ICE2015/CHL-414"
      2 "N-ICE2015/CHL-173"
      2 "N-ICE2015/CHL-105"
      2 "N-ICE2015/BSI-058"
      2 "MOSJ2015/CHL-43"
      2 "MOSJ2015/CHL-42"
      2 "MOSJ2015/CHL-41"
      2 "MOSJ2015/CHL-40"
      2 "MOSJ2015/CHL-39"
cnrdh commented 6 years ago

Only 38 left now 19 "undefined" 2 "R6_R1_S; R6_R2_S" 2 "R6_R1_M; R6_R2_M" 2 "R6_R1_B; R6_R2_B" 2 "N-ICE2015/OXY-024" 2 "N-ICE2015/IAT-230" 2 "N-ICE2015/IAT-229" 2 "N-ICE2015/FCM-534" 2 "N-ICE2015/FCM-437" 2 "N-ICE2015/FCF-109" 2 "N-ICE2015/DOX-232" 2 "N-ICE2015/DIC-632" 2 "MOSJ2017/DIC-091" 2 "MOSJ2016/ZOT-063" 2 "MOSJ2016/ZOT-062" 2 "MOSJ2016/ZOT-061" 2 "MOSJ2016/ZOT-018" 2 "MOSJ2016/ZOT-017" 2 "MOSJ2016/ZOT-016" 2 "MOSJ2016/ZOT-015" 2 "MOSJ2016/ZOT-014" 2 "MOSJ2016/PAB-070" 2 "MOSJ2016/MAA-045" 2 "MOSJ2016/MIT-015" 2 "MOSJ2016/CDO-054" 2 "MOSJ2014/FCM-068" 2 "MOSJ2014/FCM-067" 2 "MOSJ2014/FCM-066" 2 "MOSJ2014/FCM-065" 2 "MOSJ2014/FCM-064" 2 "MOSJ2014/FCM-063" 2 "MOSJ2014/FCM-062" 2 "ICE2014/ZOT-076" 2 "GlacierFront2016/NUT-243" 2 "GlacierFront2016/NUT-242" 2 "GAS-502\"" 2 "GAS-501\"" 2 "DOC-200\""

cnrdh commented 6 years ago

Also these from 2001: 2 "01 V15 WP3 8" 2 "01M V10 WP3" 2 "01M Kb52 WP3" 2 "01M Kb28 WP3" 2 "01 Kb52 WP3 1" 2 "01 Kb28 WP3 1"

cnrdh commented 6 years ago

Deleted with expedition 01M, kept identical except expedition OAERRE-2001


2001-05-22T14:00:00Z    10.9    79.0305 V15 Lance   01M "WP3 1000 µm"   "01 V15 WP3 8|01M V10 WP3"  300-0|100-0 taxonomy|lipids mesozooplankton|    319
2001-05-22T01:05:00Z    12.181667   78.913333   Kb28    Lance   01M "WP3 1000 µm"   "01 Kb28 WP3 1|01M Kb28 WP3"    90-0|80-0   taxonomy|lipids mesozooplankton|    101 
2001-05-21T12:45:00Z    11.421667   79.041667   Kb52    Lance   01M "WP3 1000 µm"   "01 Kb52 WP3 1|01M Kb52 WP3"    200-0|200-0 taxonomy|lipids mesozooplankton|    240
cnrdh commented 6 years ago

As expected, unwinding samples led to new duplicates, but these were removed again by the fix for #16. Status now for 27384 samples:


$ cat data/master/sample/*.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
      2 "01_V15_WP3_8"
      2 "01M_V10_WP3"
      2 "01M_Kb52_WP3"
      2 "01M_Kb28_WP3"
      2 "01_Kb52_WP3_1"
      2 "01_Kb28_WP3_1"
cnrdh commented 6 years ago

Why arent't all 2014-2017 samples prefixed? Because they break the expected XXX-NNN pattern... See #16

cnrdh commented 5 years ago

MOSJ2017:


"R6_R1_B"
"R6_R1_M"
"R6_R1_S"
"R6_R2_B"
"R6_R2_M"
"R6_R2_S"
"V12_R1_25m"
"V12_R1_B"
"V12_R1_M"
"V12_R1_S"