Open ktoddbrown opened 5 months ago
This is very very cool!
In Table 1: Do: Are there actually five entries for each variable? "table-ID-variable-type-entry"? Do you need the "table" level in order to handle datasets structured in a nested fashion (tables within tables)... I see step three is "merging" or "flattening" into a single long table which implies merging across tables. Do: Do we need to make sure we are making tuples rather than lists?
Typo: "The intent of this [phase] of the work flow...
Downloads: The instructions are to download the stable archived versions of the data... so no downloads of dev versions from github repos? This makes sense, just clarifying. Downloads: Some data repositories have a fair use popup or other similar information gathering popup that appears before it will let you download data (my main example that is a pain in my life: CIFOR: https://data.cifor.org/dataset.xhtml?persistentId=doi:10.17528/CIFOR/DATA.00058)... does this present a potential issue?
Shoestring: This conceptually makes sense to me, but I think I might need to actually see some example script to fully understand (not because it isn't clear - just because I'm lacking in my data science skills).
A general question: For soil science data, unit conversions are the devil, correct?... is there a master list of the units all the variables are supposed to be converted to prior to ingestion (e.g. base cations: centimoles of charge vs. mg per gram; for soil C: percent, mg per g... separate column of Mg per hectare; C:N ratios based on physical mass or molar mass, etc.)?
Thanks Kate! Alright here are my ToDos from this review.
list
to denote the R data type and leave tuple in plain text.
Review for clarity. Does this give a good overview of this phase of the workflow? What does this page explain well? What needs to change or be explained better? What are your questions?
https://github.com/ktoddbrown/SoilDRaH/wiki/Read-scripts