Closed tupuni closed 4 years ago
@Tupuni as far as I can see, a lot of these columns are empty - should they still be included in dist?
@xrotwang that is true, especially because we still missing the most advanced methodological metadata, but a lot of these will used in the near future (within the next year of so).
I also just found out that there are some rows with same method ID and parameter but different Instrument
. Is this an error, or is the instrument also reference sample dependent?
This must be an error, as the method ID is partly defined by Instrument
.
The same goes for the analysis date - which must be an error, I'd say. It's only about 30 rows or so, and it's the same rows as the ones with the different insturments.
Ok I need to fix that
--> Kneebone-2018-MA_A Mn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Mn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Nb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Nb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Rb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Rb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Sr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Sr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Th
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Th
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Y
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Y
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
0it [00:00, ?it/s]--> Kneebone-2018-MA_A Mn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Mn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Nb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Nb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Rb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Rb
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Sr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Sr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Th
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Th
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Y
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Y
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zn
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
--> Kneebone-2018-MA_A Zr
date | 2017.0 | 2016.0
instrument | Bruker Tracer III SD | Bruker AXS or Innov-X Alpha
fixed
@Tupuni I propose to leave the fractionation data out for this release - since ot would only add empty columns. Will fix the others.
@xrotwang ok let's add them when we have content
I added tables for reference samples and normalizations now. And to make the data model more consistent, I also factored methodological metadata out into a separate table. (It was inlined in the measurements table for convenience). While this adds a bit of complexity when you want to retrieve this data, it also makes for more concistency and a lot smaller file sizes.
The one big example query we have stays the same, and I'm going to add another example for querying methodological metadata for a particular sample:
select
distinct
s.sample_name, s.artefact_id,
d.parameter, d.value_string,
coalesce(m.analyst, 'NA'),
coalesce(n.reference_sample_name, 'NA') as 'normalization_reference_sample',
coalesce(n.citation, 'NA') as 'normalization_citation'
from
"samples.csv" as s
join "measurements.csv" as d on s.id = d.sample_id
left outer join "methods.csv" as m on m.id = d.method_id
left outer join "methods_normalizations.csv" as mn on mn.method_id = m.id
left outer join "normalizations.csv" as n on mn.normalization_id = n.id
where
s.id = 'Hermann-2017-JASR_At3-229-064'
;
The above query results in
sample_name,artefact_id,parameter,value_string,"coalesce(m.analyst, 'NA')",normalization_reference_sample,normalization_citation
...
At3-229-064,TUP-AT3-229,"Dy [ppm]",5.7,"Liorzou, Céline",NA,NA
At3-229-064,TUP-AT3-229,"Er [ppm]",2.8,"Liorzou, Céline",NA,NA
At3-229-064,TUP-AT3-229,"Yb [ppm]",2.0,"Liorzou, Céline",NA,NA
At3-229-064,TUP-AT3-229,"Th [ppm]",5.0,"Liorzou, Céline",NA,NA
At3-229-064,TUP-AT3-229,Sr87_Sr86,"0.702804±8e-06 2σ","Chauvel, Catherine; Sauzéat, Lucie",NA,NA
At3-229-064,TUP-AT3-229,Pb206_Pb204,"20.849898±0.000954 2σ","Chauvel, Catherine; Sauzéat, Lucie",NA,NA
At3-229-064,TUP-AT3-229,Pb207_Pb204,"15.730549±0.000679 2σ","Chauvel, Catherine; Sauzéat, Lucie",NA,NA
At3-229-064,TUP-AT3-229,Pb208_Pb204,"40.126541±0.00202 2σ","Chauvel, Catherine; Sauzéat, Lucie",NA,NA
At3-229-064,TUP-AT3-229,"Age [Ma]",20.03,"Guillou, Hervé",ACs,"Nomade et al., 2005"
Ok, it would make more sense to showcase the query of the following fields (since reference sample is vastly used) :
Note: These measurements are linked to a method id, not a specific sample id or artefact id.
Note: There are 3 fields with ref sample names in the methods and they should probably be explicitly distinct to make the query transparent :
ok, so
select
distinct
s.sample_name,
m.code,
m.parameter,
coalesce(r.sample_name, 'NA') as 'Reference sample name (international standard)',
coalesce(r.sample_measured_value, 'NA') as 'Measured value',
coalesce(r.uncertainty, 'NA') as 'SD',
coalesce(r.uncertainty_unit, 'NA') as 'SD Unit'
from
"samples.csv" as s
join "measurements.csv" as d on s.id = d.sample_id
join "methods.csv" as m on m.id = d.method_id
left outer join "methods_reference_samples.csv" as mr on mr.method_id = m.id
left outer join "reference_samples.csv" as r on mr.reference_sample_id = r.id
where
s.id = 'Hermann-2017-JASR_At3-229-064'
;
and
sample_name,code,parameter,"Reference sample name (international standard)","Measured value",SD,"SD Unit"
...
At3-229-064,Hermann-2017-JASR_A,Dy,WSE,6.0,2.97959678408457,%
At3-229-064,Hermann-2017-JASR_A,Er,AC-E,15.93,2.5165074820158275,%
At3-229-064,Hermann-2017-JASR_A,Er,JB-2,2.6,21.838794290669927,%
At3-229-064,Hermann-2017-JASR_A,Er,WSE,3.02,21.17731217801253,%
At3-229-064,Hermann-2017-JASR_A,Yb,AC-E,15.66,1.4079126317055592,%
At3-229-064,Hermann-2017-JASR_A,Yb,JB-2,2.62,2.7650232098860696,%
At3-229-064,Hermann-2017-JASR_A,Yb,WSE,2.48,1.1237823318988296,%
At3-229-064,Hermann-2017-JASR_A,Th,AC-E,16.65,2.755220355842639,%
At3-229-064,Hermann-2017-JASR_A,Th,JB-2,<LD,0.0,%
At3-229-064,Hermann-2017-JASR_A,Th,WSE,3.0,15.028249662053367,%
At3-229-064,Hermann-2017-JASR_B,Sr87_Sr86,"NBS 987",0.710248,8e-06,"2σ"
At3-229-064,Hermann-2017-JASR_C,Pb206_Pb204,"NBS 981",16.9415,0.0052,"2σ"
At3-229-064,Hermann-2017-JASR_C,Pb207_Pb204,"NBS 981",15.4924,0.002,"2σ"
At3-229-064,Hermann-2017-JASR_C,Pb208_Pb204,"NBS 981",36.701,0.007,"2σ"
At3-229-064,Hermann-2017-JASR_D,Age,NA,NA,NA,NA
Ok thank you that's perfect
-- 1 Missing fields in dist/measurements.csv (location in raw xlsx file in parenthesis)
These and all other "method" metadata were originally put in a separate spreadsheet because there can be more than 1 reference sample, fractionation correction, normalization for 1 parameter and 1 method id. A solution would be to include create a separate table as dist/methods.csv
-- 2 Field name modifications in dist/samples.csv : "name" ---> "sample_name" "category" ---> "sample_category" "comment" ---> "sample_comment" "location_loc1" ---> "location_region" "location_loc2" ---> "location_subregion" "location_loc3" ---> "location_locality"
in dist/measurements.csv : "precision" ---> "value_sd" "sigma" ---> "sd_sigma"