Closed dylanbeaudette closed 2 years ago
Saw this coming a couple months ago.
Relevant background: https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/index.html
Yes I saw that blog also, and expected we would have to update soilDB shortly. uncode() only converts those columns that match the ColumnPhysicalNames from within NASIS's metadata domains into factors. All other character data columns, such as taxonname, are left as characters.
I'm not sure converting the stringsAsFactors argument into an option is the way to go, as that obscures the functionality from your typical user, but I'd be open to if that is the consensus. We should probably still document that functionality in each functions help file details section. If so it won't really reduce the documentation overhead.
I really think it's preferable to maintain parity with base R, therefore I suggest we change the argument name from stringsAsFactors to as.is. Although I think in issue #53 it was decided that the default should be TRUE.
As to the issue of the resulting factor ordering produced by uncode(). I think the best long-term solution would be to create a subroutine within uncode() to correct the ordering for those factors deemed to be 'out of order'. Rather than modify within each and every function where it occurs. The same solution might be used for those variables that should be excluded. I had considered the possibility of going to the source, and fixing the ordering in NASIS with the help of George or Kyle, but... decided that probably wasn't a good idea as it would destroy any backward compatibility with old data.
default.stringsAsFactors()
returns FALSE, which means default result of get_cosoilmoist_from_NASISWebReport
"status"
column value is NULL / all NA
. I fixed an example that was breaking in the CI since last week with this stack trace:
Error: Assigned data `l` must be compatible with existing data.
x Existing data has 119 rows.
x Element 9 of assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Backtrace:
x
1. +-soilDB::get_cosoilmoist_from_NASISWebReport("EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded")
2. | \-soilDB:::.cosoilmoist_prep(d.cosoilmoist, impute = impute, stringsAsFactors = stringsAsFactors)
3. | +-base::within(...)
4. | \-base::within.data.frame(...)
5. | +-base::`[<-`(...)
6. | \-tibble:::`[<-.tbl_df`(...)
7. | \-tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
8. | \-tibble:::vectbl_recycle_rhs(...)
9. | +-base::withCallingHandlers(...)
10. | \-vctrs::vec_recycle(value[[j]], nrow)
11. +-vctrs:::stop_recycle_incompatible_size(...)
12. | \-vctrs:::stop_vctrs(...)
13. | \-rlang::abort(message, class = c(class, "vctrs_error"), ...)
14. | \-rlang:::signal_abort(cnd)
15. | \-base::signalCondition(cnd)
16. \-(function (cnd) ...
Execution halted
Not sure how it suddenly causes a problem, since stringsAsFactors has been FALSE by default since R 4+, and this just started testing bad last week.
Comparison toggling stringsAsFactors back to former default TRUE
daff::diff_data(
soilDB::get_cosoilmoist_from_NASISWebReport(
"EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded",
stringsAsFactors = FALSE
),
soilDB::get_cosoilmoist_from_NASISWebReport(
"EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded",
stringsAsFactors = TRUE
)
)
#> Loading required namespace: rvest
#> Warning in eval(substitute(expr), e): NAs introduced by coercion
#> Warning in eval(substitute(expr), e): NAs introduced by coercion
#> Daff Comparison: 'soilDB::get_cosoilmoist_from_NASISWebReport("EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded", ' ' stringsAsFactors = FALSE)' vs. 'soilDB::get_cosoilmoist_from_NASISWebReport("EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded", ' ' stringsAsFactors = TRUE)'
#> First 6 and last 6 patch lines:
#> nationalmusym muname
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> ... ... ...
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> dmuiid coiid compname comppct_r drainagecl month
#> -> 119696 238422 Ross 100 well jan
#> -> 119696 238422 Ross 100 well feb
#> -> 119696 238422 Ross 100 well mar
#> -> 119696 238422 Ross 100 well apr
#> -> 119696 238422 Ross 100 well may
#> -> 119696 238422 Ross 100 well jun
#> ... ... ... ... ... ... ...
#> -> 749825 2494482 Sloan 5 very poorly sep
#> -> 749825 2494482 Sloan 5 very poorly oct
#> -> 749825 2494482 Sloan 5 very poorly nov
#> -> 749825 2494482 Sloan 5 very poorly nov
#> -> 749825 2494482 Sloan 5 very poorly dec
#> -> 749825 2494482 Sloan 5 very poorly dec
#> flodfreqcl floddurcl pondfreqcl ponddurcl
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->very rare very brief Not populated->none <NA>
#> ... ... ... ... ...
#> -> Not populated <NA> Not populated <NA>
#> -> Not populated <NA> Not populated <NA>
#> -> Not populated->frequent brief Not populated->occasional brief
#> -> Not populated->frequent brief Not populated->occasional brief
#> -> Not populated->frequent brief Not populated->occasional brief
#> -> Not populated->frequent brief Not populated->occasional brief
#> cosoilmoistiid dept_l dept_r dept_h depb_l depb_r depb_h
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> ... ... ... ... ... ... ... ...
#> -> 23327632 0 0 0 200 200 200
#> -> 23327616 0 0 0 200 200 200
#> -> 23327627 0 0 0 0 7 15
#> -> 23327628 0 7 15 200 200 200
#> -> 23327630 0 0 0 0 7 15
#> -> 23327629 0 7 15 200 200 200
#> status
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> ... ...
#> -> NULL->moist
#> -> NULL->moist
#> -> NULL->moist
#> -> NULL->wet
#> -> NULL->moist
#> -> NULL->wet
Thanks Andrew. I looked into and fixed. I'll try and examine how do deal with all of the other functions shortly.
We are passing around
default.stringsAsFactors()
to a lot of functions, each with its own arguments, flow-control, and documentation to manage. As far as I can tell (@smroecker please correct me if I am wrong), this is all to handle howuncode()
converts or does not convert characters to factors, using the NASIS domains as the default levels. We have discussed at great length in #53, but some issues remain:droplevels()
usually required to clean-up output)uncode()
should probably use an option to reduce (41 functions below) argument / documentation overheadtaxonname
) should never be converted to factorsfetchNASIS
)Functions to Fix
[ ] fetchNASIS_components.R:.fetchNASIS_components <- function(SS=TRUE, rmHzErrors=TRUE, fill = FALSE, stringsAsFactors = default.stringsAsFactors()) {
[ ] fetchNASIS_pedons.R:.fetchNASIS_pedons <- function(SS=TRUE, rmHzErrors=TRUE, nullFragsAreZero=TRUE, soilColorState='moist', lab=FALSE, stringsAsFactors = default.stringsAsFactors()) {
[ ] fetchNASIS_report.R:.get_site_from_NASISReport <- function(url = NULL, nullFragsAreZero = TRUE, stringsAsFactors = default.stringsAsFactors()
[ ] fetchNASIS_report.R:.get_pediagfeatures_from_NASISTemp <- function(stringsAsFactors = default.stringsAsFactors()
[ ] fetchVegdata.R:fetchVegdata <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_component_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_legend_from_NASIS <- function(SS = TRUE, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_lmuaoverlap_from_NASIS <- function(SS = TRUE, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_mapunit_from_NASIS <- function(SS = TRUE, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_component_correlation_data_from_NASIS_db <- function(SS=TRUE, dropAdditional=TRUE, dropNotRepresentative=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_component_copm_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_component_esd_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_data_from_NASIS_db.R:get_comonth_from_NASIS_db <- function(SS=TRUE, fill=FALSE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_component_from_NASISWebReport <- function(projectname, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_chorizon_from_NASISWebReport <- function(projectname, fill = FALSE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_legend_from_NASISWebReport <- function(areasymbol, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_lmuaoverlap_from_NASISWebReport <- function(areasymbol, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_mapunit_from_NASISWebReport <- function(areasymbol, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_projectmapunit_from_NASISWebReport <- function(projectname, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_LIMS.R:get_projectmapunit2_from_NASISWebReport <- function(mlrassoarea, fiscalyear, projectname, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_SDA.R:get_legend_from_SDA <- function(WHERE = NULL, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_component_from_SDA.R:get_lmuaoverlap_from_SDA <- function(WHERE = NULL, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_concentrations_from_NASIS_db.R:get_concentrations_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_cosoilmoist_from_LIMS.R:get_cosoilmoist_from_NASISWebReport <- function(projectname, impute = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_cosoilmoist_from_NASIS.R:get_cosoilmoist_from_NASIS <- function(impute = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_extended_data_from_NASIS_db.R:get_extended_data_from_NASIS_db <- function(SS=TRUE, nullFragsAreZero=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_hz_data_from_NASIS_db.R:get_hz_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_phfmp_from_NASIS_db.R:get_phfmp_from_NASIS_db <- function(SS = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_projectmapunit_from_NASIS.R:get_projectmapunit_from_NASIS <- function(SS = TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_site_data_from_NASIS_db.R:get_site_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_soilseries_from_NASIS.R:get_soilseries_from_NASIS <- function(stringsAsFactors = default.stringsAsFactors()) {
[ ] get_soilseries_from_NASIS.R:get_soilseries_from_NASISWebReport <- function(soils, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R:get_vegplot_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_location_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_trhi_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_species_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_transect_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_transpecies_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_tree_si_summary_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_tree_si_details_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
[ ] get_vegplot_data_from_NASIS_db.R: get_vegplot_textnote_from_NASIS_db <- function(SS=TRUE, fixLineEndings=TRUE, stringsAsFactors = default.stringsAsFactors()) {