ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

redundant columns in default RAT of `mukey.wcs()` and `ISSR800.wcs()` #292

Closed brownag closed 1 year ago

brownag commented 1 year ago

Investigate whether it is still necessary to duplicate column data with different names in the categories of resulting SpatRaster objects

Previously this was done for compatibility with rasterVis which required a column actually called "ID" in the table (raster/early terra package holdover)

I think it is possible that with the latest versions of terra/rasterVis/etc. a single column (named anything, probably "mukey" for use with joins) will be sufficient

brownag commented 1 year ago

It appears that there is no need for "ID" column with terra + rasterVis anymore.

However, converting SpatRaster output to RasterLayer via e.g. raster(<SpatRaster>) does not seem to preserve categories in the result unless "ID" column is in first position. This means this currently does not work right: rasterVis::levelplot(raster::raster(mukey.wcs(...)))

terra historically used "value" "category" as default when setting categories via a vector. That behavior is now deprecated in terra, a data.frame is required.

We could replace "value" with "ID" in first position as terra is ambivalent about first column's name in input data.frame value to preserve conversions back to raster package objects. I want to confirm this is the case before changing, but pretty sure the specific name "value" is used only in lieu of a user-specified column name for that first position.

Also I said a single column would be possible, but a minimum of two columns are required. If we only have mukey then mukey needs to be duplicated as e.g. ID, mukey. So, for our mukey.wcs() results there is always at least one duplicated column to "fool" the package/plotting tools etc. into treating the mukey as a category rather than continuous value.

brownag commented 1 year ago

Note also that ISSR800, which is based on raster package assumptions initially, uses the column called "ID" explicitly in first position. So these changes bring mukey.wcs() and ISSR800.wcs() into more consistent state