Closed piersond closed 3 years ago
Going back to the read_sheet script, there's a quick 2 line fix. I changed the ftn to read all columns in as characters (no more lists!), then I use type.convert across all columns to have R guess the columns that are numeric. Given the complexity of the datasets we're working with, I can't fully vouch for this solution yet, but it does "unbreak" the data_homogenization ftn. I'll keep testing.
dataFile <- googlesheets4::read_sheet(
ss = fileId,
skip = skipRows,
na = missingValueCode,
col_types = "c") # <-- new
output_df <- map_dfc(.x = dataFile, .f = ~type.convert(.x, as.is=T)) # <-- new
return(output_df) # <-- new
With a recent tidyR update, google sheets are now imported with data structures preserved at the cellular level rather than at a column level. When a column contains more than one data structure, e.g. 123, "NoData", the column is imported in the tibble as a list. Further, when we use the replace NA function to remove a value, e.g. "NoData", the structure remains as a character. This causes a number of cascading issues in our QC code and effectively breaks the homogenization ftn. It may also break the tarball creation function as well (yet to be tested).
I've mostly encountered this issue when text values are included in data columns to indicate missing or NA data. Thus, a short-term work around is to manually replace those character values with NA prior to running the homogenization ftn.
I've now spent 15+ hours on re-working our homogenization code to adjust for the change in data structure. Needless to say this is one heck of a wrench in our code. I'm hopeful I'll get it worked out and submit a pull request with the new code by the end of today.