`get_SDA_interpretation()`: add subrule ratings to "reason" field

brownag commented 9 months ago

Adds subrule ratings to "reason" fields calculated for each mrulename in a get_SDA_interpretation() query.

This helps get key information about subrules that are exported in cointerp with ruledepth > 0

TODO:

[x] consider if there is a truly generic way to flatten these results 1:1 with mukey/component/mrulename... or if that has to be left to the user/unique to the interpretation being queried
- could pack an optional XML column containing arbitrary complexity about subrules
[x] ~~.interpretation_weighted_average() needs SQLite compatible STRING_AGG() switch~~(wontfix; the rest of the query is not SQLite compatible either)
[x] ~~order subrule "reasons" alphabetically? or at least consistently~~ (wontfix; can't use ORDER BY in the T-SQL subquery)
[x] some subrule reasons are hard to interpret without subrule name

Will close #303

Note the "reason" field now includes the interphr as well as interphrc values for rules with ruledepth != 0

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"))
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)

brownag commented 8 months ago

Added subrule names, along with reason class and rating. Format is `{SUBRULE} "{REASON}" ({RATING}); {SUBRULE} "{REASON}" ({RATING});". It will be consistent, but a pain to parse if you really need those values. Also the ordering can be inconsistent.

Could potentially add an optional argument to get_SDA_interpretation() that post-processes all of the reason fields and "widens" the data.frame result accordingly, adding one column per subrule rating. Similar to example in #303

brownag commented 8 months ago

Added argument to get_SDA_interpretation() called wide_reason, default FALSE. If TRUE, this new function does some post-processing. It parses the string contents of the "reason_*" fields from the result and adds a new column for each subrule rating within each main rule.

So, now you can quickly obtain ready-to-use subrule ratings for arbitrary interps, which should adequately cover needs from #303

library(soilDB)
x <- get_SDA_interpretation(rulename  = c("NCCPI - National Commodity Crop Productivity Index (Ver 3.0)", 
                                          "AGR - Pesticide Loss Potential-Leaching", 
                                          "ENG - Local Roads and Streets"),
                            method     = "Dominant Component", not_rated_value = "Not rated",
                            mukeys     = c("242963","242964","242965"), wide_reason = TRUE)
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)
#>   rating_AGRPesticideLossPotentialLeaching
#> 1                                Not rated
#> 2                                Not rated
#> 3                                Not rated
#>   class_AGRPesticideLossPotentialLeaching
#> 1                                    <NA>
#> 2                                    <NA>
#> 3                                    <NA>
#>   reason_AGRPesticideLossPotentialLeaching rating_ENGLocalRoadsandStreets
#> 1                                     <NA>                              1
#> 2                                     <NA>                              1
#> 3                                     <NA>                              1
#>   class_ENGLocalRoadsandStreets
#> 1                  Very limited
#> 2                  Very limited
#> 3                  Very limited
#>                                                                                                                                                                                                                                                                                      reason_ENGLocalRoadsandStreets
#> 1 Ponded > 4 hours "Ponding" (1); Wet, Ground Water Near the Surface (30 - 75cm) "Depth to saturated zone" (1); Potential Frost Action > Low "Frost action" (1); Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (1); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.37)
#> 2                                                                                                          Potential Frost Action > Low "Frost action" (1); Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (0.955); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.375)
#> 3                          Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (1); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.894); Wet, Ground Water Near the Surface (30 - 75cm) "Depth to saturated zone" (0.746); Potential Frost Action > Low "Frost action" (0.5)
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_Impactedsoil
#> 1                                                                           0
#> 2                                                                           0
#> 3                                                                           0
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICottonSubmodelII
#> 1                                                                                     0.001
#> 2                                                                                     0.001
#> 3                                                                                     0.001
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISmallGrainsSubmodelII
#> 1                                                                                          0.687
#> 2                                                                                          0.791
#> 3                                                                                          0.734
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISoybeansSubmodelI
#> 1                                                                                      0.752
#> 2                                                                                      0.777
#> 3                                                                                       0.76
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICornSubmodelI
#> 1                                                                                  0.826
#> 2                                                                                  0.917
#> 3                                                                                  0.899
#>   rating_reason_AGRPesticideLossPotentialLeaching_Notrated
#> 1                                                       NA
#> 2                                                       NA
#> 3                                                       NA
#>   rating_reason_ENGLocalRoadsandStreets_Ponded4hours
#> 1                                                  1
#> 2                                               <NA>
#> 3                                               <NA>
#>   rating_reason_ENGLocalRoadsandStreets_WetGroundWaterNeartheSurface3075cm
#> 1                                                                        1
#> 2                                                                     <NA>
#> 3                                                                    0.746
#>   rating_reason_ENGLocalRoadsandStreets_PotentialFrostActionLow
#> 1                                                             1
#> 2                                                             1
#> 3                                                           0.5
#>   rating_reason_ENGLocalRoadsandStreets_StrengthAASHTOGroupIndexWeightedAverage25100cm
#> 1                                                                                    1
#> 2                                                                                0.955
#> 3                                                                                    1
#>   rating_reason_ENGLocalRoadsandStreets_ShrinkSwellLEPWTDAVG25100cmorBedrock
#> 1                                                                       0.37
#> 2                                                                      0.375
#> 3                                                                      0.894

brownag commented 8 months ago

A final consideration: soilDB:::.cleanRuleColumnName() strips non-alphanumeric characters to make a "legal" R column name. It is possible this could lead to some collisions w/ certain subrule names...

For instance, inequalities are lost. "Ponded > 4 hours" and "Ponded < 4 hours" simplify to the same name "Ponded4hours". Could add a few things like replacing ">" "<" "=" with "GT" "LT" "EQ"...

It appears that collisions will be rare, and only for Texas subrules in FY24 SSURGO, but not impossible:

library(soilDB)
x <- SDA_query("SELECT DISTINCT rulename FROM cointerp")[[1]]
#> single result set, returning a data.frame
y <- soilDB:::.cleanRuleColumnName(x)

length(x)
#> [1] 3237
length(unique(y))
#> [1] 3231

xx <- c(x[duplicated(y)], x[duplicated(y, fromLast = TRUE)])
sort(xx)
#>  [1] "AGR - Rutting Hazard =< 10,000 Pounds per Wheel (TX)"        
#>  [2] "AGR - Rutting Hazard > 10,000 Pounds per Wheel (TX)"         
#>  [3] "CaCO3 < 40% by Wght. Av. 0-40\" (TX)"                        
#>  [4] "CaCO3 > 40% by Wght. Av. 0-40\" (TX)"                        
#>  [5] "Excess Humus (FB, Peat, HM/MPT Surface Layer) (TX)"          
#>  [6] "Excess Humus (FB/Peat/HM/MPT Surface Layer) (TX)"            
#>  [7] "Flooding Occasional or greater; Duration Long,Very Long (TX)"
#>  [8] "Flooding Occasional or greater; Duration Long/Very Long (TX)"
#>  [9] "Ponding => Frequent (TX)"                                    
#> [10] "Ponding Frequent (TX)"                                       
#> [11] "Soil Strength (Rutting Vehicle =< 10,000 Pounds) (TX)"       
#> [12] "Soil Strength (Rutting Vehicle > 10,000 Pounds) (TX)"

brownag commented 8 months ago

There was only one existing collision in mrulename (as opposed to the few listed above for rulename). However, the modification to add inequalities back in will add a few characters to several existing mrulename which could be a small breaking change.

This is the list of affected interpretations that folks will need to update column name references for:

AGR - Rutting Hazard > 10,000 Pounds per Wheel (TX)
GRL - NV range seeding (Wind C >= 160) (NV)
GRL - Fencing, Post Depth =<24 inches
WLF - Food Plots for Upland Wildlife < 2 Acres (TX)
AGR - Rutting Hazard =< 10,000 Pounds per Wheel (TX)
GRL - Fencing, Post Depth =<36 inches
GRL - NV range seeding (Wind C = 10) (NV)
GRL - NV range seeding (Wind C = 30) (NV)
GRL - NV range seeding (Wind C = 20) (NV)
GRL - NV range seeding (Wind C = 100) (NV)
GRL - NV range seeding (Wind C = 40) (NV)
GRL - NV range seeding (Wind C = 60) (NV)
GRL - NV range seeding (Wind C = 80) (NV)
GRL - NV range seeding (Wind C = 50) (NV)

ncss-tech / soilDB

`get_SDA_interpretation()`: add subrule ratings to "reason" field #308