ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 19 forks source link

fetchSDA returns empty result and horizon ID name (chkey) not unique #132

Closed metalaureate closed 4 years ago

metalaureate commented 4 years ago

Thank you for your package, apologies if this questions reflects my ignorance. I can typically return clay, sand, pH, and drainage values for 90% of the US mapunits I try, but 10% return an empty set and the error horizon ID name (chkey) not unique

For example :

fetchSDA(WHERE = paste0("mukey=",mukey),duplicates = TRUE) when mukey = 497405, 497477, 497486

produces


empty result set
horizon ID name (chkey) not uniqueWarning message:

Can you give me any guidance on what I am doing wrong?

brownag commented 4 years ago

Hey @metalaureate It seems like your WHERE argument does not have the right format. fetchSDA constructs an SQL query from the user input,.

The correct/suggested way to do a multi-mukey query would to create an SQL IN statement (soilDB provides a function format_SQL_in_statement.

And our WHERE clause takes the form "WHERE mukey IN ('a', 'b', 'c')" where a, b, c are user-supplied mukeys.

library(aqp)
library(soilDB)
mukeys <- soilDB::format_SQL_in_statement(as.character(c(497405, 497477, 497486)))
res <- fetchSDA(WHERE = paste0("mukey IN ", mukeys), duplicates = TRUE)
> res <- fetchSDA(WHERE = paste0("mukey IN ", mukeys), duplicates = TRUE)
single result set, returning a data.frame
single result set, returning a data.frame
single result set, returning a data.frame
empty result set
single result set, returning a data.frame
single result set, returning a data.frame
empty result set
converting IDs from integer to character
Warning message:
In `hzidname<-`(`*tmp*`, value = "chkey") :
  horizon ID name (chkey) not unique. unique ID not changed.
> head(res)
Object of class SoilProfileCollection
Number of profiles: 4
Depth range: 15-152 cm

Note that a warning is generated, because with duplicates = TRUE (which returns a record for each unique mukey), there may be multiple instances of a chkey (unique horizon ID) per each national mapunit symbol.

# works as expected
head(res)
brownag commented 4 years ago

Also, I did not verify that you were using a recent version of soilDB.

This works as I specify above using latest CRAN and GitHub versions of the package, so make sure you have one of those.

devtools::install_github('ncss-tech/soilDB', dependencies=FALSE)

metalaureate commented 4 years ago

Thank you and apologies for my unclear question: I did not mean to draw attention to multiple mukey syntax. I am referring to the fact that any of those mukeys return an empty result set.

I use a query like this to get matching mukeys from lat, longs:

 q <- paste0("SELECT mukey, muname
  FROM mapunit
  WHERE mukey IN (
  SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('", p.wkt, "')
  )")

  soils <- SDA_query(q)

And then try to get the sand, clay, pH, etc. by doing a lookup on the mukeys returned by the query above:

E.g. using the

fetchSDA(WHERE = paste0("mukey = ",497405),duplicates = TRUE)
single result set, returning a data.frame
single result set, returning a data.frame
single result set, returning a data.frame
empty result set
single result set, returning a data.frame
single result set, returning a data.frame
empty result set
Error in `hzidname<-`(`*tmp*`, value = "chkey") : 
  horizon ID name (chkey) not unique

BTW, in using the CRAN install, R complains of corrupt files:

devtools::install_github('ncss-tech/soilDB', dependencies=FALSE) Skipping install of 'soilDB' from a github remote, the SHA1 (d68efd2c) has not changed since last install. Use force = TRUE to force installation library(soilDB) help(soilDB) Error in fetch(key) : lazy-load database '/Users/simonhill/Library/R/3.5/library/soilDB/help/soilDB.rdb' is corrupt

Any pointers to what I am doing wrong would be much appreciated!

brownag commented 4 years ago

OK. So in principle that should work -- do a spatial query to obtain vector of MUKEYs -- then take a unique set of mukeys and query for component/mapunit level data.

I can assure you that the MUKEYS you provided to me up above have data in them necessary to produce a non-empty SPC result from fetchSDA (containing 4 components or profiles)

You need to make sure that you restart your R session (or better yet, do the install in a fresh R session) then try and install. It does look like you have the latest version off github installed, though.

In R studio you can restart session with Ctrl+Shift+F10. See if the package loads after restarting. If not, reinstall in a session where the package is not loaded.

"Lazy loading" preloads parts of a package, but "what you need" -- so that if a package is loaded in current session it sometimes does not properly "detach" when you try to re-install it.

The output "empty result set" is referring to individual sub-queries being called by fetchSDA - they are not the cause of your error. Some duplication of the horizon data is causing it -- though that should not be triggering an error in the latest version of the package -- so make sure you get that straightened out.

Could you give me an example list of MUKEYs that are causing this problem? I can likely tell you more specifically what tables are the issue.

brownag commented 4 years ago

When I query the 3 mukeys above these are the 4 profiles I obtain.

image

Seven query result sets are returned -- of which two are empty. These are likely a child table of component.

If I set childs=FALSE the child tables are not queried, and we see fewer result sets (only 4), of which 1 is empty. We obtain the same 4 components we did above, but without child table data.

f <- fetchSDA(WHERE = paste0("mukey IN ", 
                             format_SQL_in_statement(c(497405, 497477, 497486))), 
              duplicates = T, childs=FALSE)
metalaureate commented 4 years ago

Thank you for swift response. I have soilDB version 2.5.3, which is the latest. Does not throw a corruption error after restarting. However, mukeys still do not return data.

For example, 497405, 497477, 497486 all return an empty set. On reading your latest comment, I think I don't know what I am doing. Normally, for 90% of cases, fetchSDA returns a data table of horizons, from which I pick data from the first horizon. In these examples, an empty set is returned with an error, and I don't know what to do next.

brownag commented 4 years ago

Hm. I simply can't replicate that result with those 3 MUKEYs.

library(soilDB)

f <- fetchSDA(WHERE = paste0("mukey IN ", 
                             format_SQL_in_statement(c(497405, 497477, 497486))), 
              duplicates = T, childs = FALSE)

plot(f, label="compname")

# get clay, sand and pH from horizon table (first horizon in each profile)
horizons(f[,1])[,c('claytotal_r','sandtotal_r',"ph1to1h2o_r","ksat_r")]

# drainage class from site/component table
site(f)[,c('drainagecl')]

Which produces this output.

single result set, returning a data.frame
single result set, returning a data.frame
single result set, returning a data.frame
empty result set
converting IDs from integer to character
Warning message:
In `hzidname<-`(`*tmp*`, value = "chkey") :
  horizon ID name (chkey) not unique. unique ID not changed.
> # get clay, sand and pH from horizon table (first horizon in each profile)
> horizons(f[,1])[,c('claytotal_r','sandtotal_r',"ph1to1h2o_r","ksat_r")]
         claytotal_r sandtotal_r ph1to1h2o_r ksat_r
17698786        33.5        34.2         7.2   2.32
17698787          NA          NA          NA     NA
17698814        34.0        33.9         7.0   2.82
17699435        31.5        35.3         7.0   2.82

> site(f)[,c('drainagecl')]
[1] "Well drained" NA             "Well drained" "Well drained"
metalaureate commented 4 years ago

Here is your exact code run on my RStudio (Mac Pro):

library(soilDB)

f <- fetchSDA(WHERE = paste0("mukey IN ", 
                             format_SQL_in_statement(c(497405, 497477, 497486))), 
              duplicates = T, childs = FALSE)

single result set, returning a data.frame single result set, returning a data.frame single result set, returning a data.frame empty result set Error in hzidname<-(*tmp*, value = "chkey") : horizon ID name (chkey) not unique

f is empty.

I'll try to remove soilDB and re-add.

brownag commented 4 years ago

OK. You must have an old version of soilDB... somehow.

I just reinstalled mine off CRAN (via install.packages not devtools) to verify that this works even as far back as version 2.5. Then reinstalled latest version from devtools (2.5.3) and confirmed again.

I do remember fixing this "error" -- it is "manufactured" in that we decided that it should be a warning, not an error, if something in the query causes duplication of horizons.

metalaureate commented 4 years ago

So I try:

remove.packages("soilDB") Removing package from ‘/Users/simonhill/Library/R/3.5/library’ (as ‘lib’ is unspecified)

Restarting R session...

help(soilDB)

No documentation for ‘soilDB’ in specified packages and libraries: you could try ‘??soilDB’

devtools::install_github('ncss-tech/soilDB', dependencies=FALSE)

─ preparing ‘soilDB’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ looking to see if a ‘data/datalist’ file should be added ─ building ‘soilDB_2.5.3.tar.gz’

library(soilDB) help(soilDB)

returns version 2.5.3

Then

 f <- fetchSDA(WHERE = paste0("mukey IN ", 
                              format_SQL_in_statement(c(497405, 497477, 497486))), 
              duplicates = T, childs = FALSE)

single result set, returning a data.frame single result set, returning a data.frame single result set, returning a data.frame empty result set Error in hzidname<-(*tmp*, value = "chkey") : horizon ID name (chkey) not unique

_head_bangingwall

metalaureate commented 4 years ago

Some fresh hell, though. Now cannot find fetchSDA

library(soilDB)

f <- fetchSDA(WHERE = paste0("mukey IN ", format_SQL_in_statement(c(497405, 497477, 497486))), duplicates = T, childs = FALSE) Error in fetchSDA(WHERE = paste0("mukey IN ", format_SQL_in_statement(c(497405, : could not find function "fetchSDA"

My system reverted to version 2.4.1. So I tried to remove that:

remove.packages("soilDB") Removing package from ‘/Users/simonhill/Library/R/3.5/library’ (as ‘lib’ is unspecified) Error in remove.packages : there is no package called ‘soilDB’ help("soilDB") remove.packages("soilDB") Removing package from ‘/Users/simonhill/Library/R/3.5/library’ (as ‘lib’ is unspecified) Error in remove.packages : there is no package called ‘soilDB’

I am in lalaland!

brownag commented 4 years ago

Ugh. Screwy. Perhaps try running install/code in a regular R console, outside R studio -- see if you can get the installs to work.

The other possibility is RStudio and/or R is confused about where your packages are installled. When I start R, I get this printed out:

R library paths: 
 C:/Users/Andrew.G.Brown/Documents/R/win-library/3.6
 C:/Program Files/R/R-3.6.3/library

Make sure that lines up with what you expect -- if it doesnt -- as is sometimes the case with networked situations -- then this can cause mayhem for packages.

This shouldn't really make a difference I don't think -- since you installed from source -- but I am wondering whether there is a bad interaction with your R version. We are running R 3.6.3 on our computers -- switching to 4.0.0 soon -- 3.5 is pretty dated. But again, it shouldn't matter for soilDB

metalaureate commented 4 years ago

I deleted all references to soildb in my hard drive. Then in terminal, as follow, and now devtools::install_github will only install version 2.5.3:

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

devtools::installgithub('ncss-tech/soilDB', dependencies=FALSE) Downloading GitHub repo ncss-tech/soilDB@master ✔ checking for file ‘/private/var/folders/x/znvl1w3n1r59zz0l0gw2m3xw0000gn/T/RtmphzDBJH/remotes1dfb7709864a/ncss-tech-soilDB-d68efd2/DESCRIPTION’ ... ─ preparing ‘soilDB’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ looking to see if a ‘data/datalist’ file should be added ─ building ‘soilDB_2.5.3.tar.gz’

Installing package into ‘/Users/simonhill/Library/R/3.5/library’ (as ‘lib’ is unspecified)

metalaureate commented 4 years ago

Upgrading to R 4.0 to see if that changes my fate...

metalaureate commented 4 years ago

Now in worse shape I think

devtools::installgithub('ncss-tech/soilDB', dependencies=FALSE) Downloading GitHub repo ncss-tech/soilDB@master ✓ checking for file ‘/private/var/folders/x/znvl1w3n1r59zz0l0gw2m3xw0000gn/T/RtmplVj03g/remotes1e913ff7d159/ncss-tech-soilDB-d68efd2/DESCRIPTION’ ... ─ preparing ‘soilDB’: ✓ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ looking to see if a ‘data/datalist’ file should be added ─ building ‘soilDB_2.5.3.tar.gz’

ERROR: dependencies ‘aqp’, ‘plyr’, ‘sp’, ‘reshape2’, ‘raster’ are not available for package ‘soilDB’

That was a terrible idea.

brownag commented 4 years ago

You just need to turn the dependencies = FALSE argument to TRUE. Each version of R has its own path of libraries and instances of paths. We are probably converging on your issue.

Usually we install dependencies from CRAN and development versions of packages from devtools. But if you just delete that argument or set to true, you will have the option to get deps from CRAN or devtools.

Though you are treading into uncharted waters - I can assure you everything works on 4.0.0 -- I am using these packages on my personal machine.

I am interested to see the contents of this function .libPaths() when you are in Rstudio -- on whatever R version you want to try to get working.

> .libPaths()
[1] "C:/Users/Andrew.G.Brown/Documents/R/win-library/3.6" "C:/Program Files/R/R-3.6.3/library"  
metalaureate commented 4 years ago

Ah, well, I downgraded to 3.6 but I am getting all kinds of errors, but I am on the same version of R as you now. ??

devtools::install_github('ncss-tech/soilDB', dependencies=TRUE)

Downloading GitHub repo ncss-tech/soilDB@master Skipping 1 packages not available: RODBC Installing 30 packages: aqp, plyr, sp, reshape2, raster, rgdal, RODBC, rgeos, rvest, latticeExtra, RCurl, XML, ggplot2, gridExtra, viridis, farver, labeling, munsell, RColorBrewer, viridisLite, colorspace, gtable, isoband, scales, reshape, plotrix, selectr, png, jpeg, bitops Error: Failed to install 'soilDB' from GitHub: (converted from warning) package ‘RODBC’ is not available (for R version 3.6.3)

.libPaths() [1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"

brownag commented 4 years ago

Ahh. Indeed, RODBC is why we have to switch to R 4.0.0 -- they no longer provide the binaries for it for 3.6. For a short time there is a workaround to get the archived binary from CRAN.

install.packages('https://cran.r-project.org/bin/windows/contrib/3.6/RODBC_1.3-16.zip', repos = NULL)

metalaureate commented 4 years ago

So

Error in install.packages : type == "both" cannot be used with 'repos = NULL'

metalaureate commented 4 years ago

But

install.packages('https://cran.r-project.org/bin/windows/contrib/3.6/RODBC_1.3-16.zip') Warning in install.packages : package ‘https://cran.r-project.org/bin/windows/contrib/3.6/RODBC_1.3-16.zip’ is not available (for R version 3.6.3)

brownag commented 4 years ago

Oh gee. Something is royally messed up.

Just ran this...

> install.packages('https://cran.r-project.org/bin/windows/contrib/3.6/RODBC_1.3-16.zip', repos = NULL)
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/Andrew.G.Brown/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
trying URL 'https://cran.r-project.org/bin/windows/contrib/3.6/RODBC_1.3-16.zip'
Content type 'application/zip' length 880331 bytes (859 KB)
downloaded 859 KB

package ‘RODBC’ successfully unpacked and MD5 sums checked
metalaureate commented 4 years ago

Oh I am on a mac. That is a windows binary?

brownag commented 4 years ago

Ah whoops! Sorry! Of course.

You might need to compile it from the source then. RODBC relies on compiled code. Try devtools::install_github("cran/rodbc") to install from source.

metalaureate commented 4 years ago

Requires 4.0 or above. I will try R 4 again.

brownag commented 4 years ago

In that case, you can get the binary off CRAN :)

metalaureate commented 4 years ago

Same problem. R 4.0, soilDB 2.5.3

Are sure https://cran.rstudio.com/bin/macosx/contrib/4.0/soilDB_2.5.tgz contains your fixes?

Restarting R session...

> install.packages("soilDB")
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/soilDB_2.5.tgz'
Content type 'application/x-gzip' length 772710 bytes (754 KB)
==================================================
downloaded 754 KB

The downloaded binary packages are in
    /var/folders/x_/znvl1w3n1r59zz0l0gw2m3xw0000gn/T//Rtmpd3xDGA/downloaded_packages
> library(soilDB)
> f <- fetchSDA(WHERE = paste0("mukey IN ", 
+                              format_SQL_in_statement(c(497405, 497477, 497486))), 
+               duplicates = T, childs = FALSE)
single result set, returning a data.frame
single result set, returning a data.frame
single result set, returning a data.frame
empty result set
Error in `hzidname<-`(`*tmp*`, value = "chkey") : 
  horizon ID name (chkey) not unique

Same for devtools::install_github('ncss-tech/soilDB', dependencies=TRUE)

Any ideas?

brownag commented 4 years ago

Oh geez. I realized what is wrong. I am very sorry.

I was able to duplicate your error when I installed the version of AQP that is available on CRAN.

You need the development version of aqp. Not the one off CRAN.

I knew that I had made a change to this but not in soilDB -- it is in aqp where the uniqueness constraint is actually enforced. My sincerest apologies.

devtools::install_github("ncss-tech/aqp", dependencies=FALSE)

Should get you going again. We will be updating AQP on CRAN soon.

metalaureate commented 4 years ago

That worked! Thank you!

brownag commented 4 years ago

@metalaureate

I regret that it was so difficult for me to figure that one out.

But I do want to point out what that error message was "protecting"you from -- so it doesn't seem like this was all in vain.

You can test whether you have overlapping horizons

library(aqp)
library(soilDB)

f <- fetchSDA(WHERE = paste0("mukey IN ", 
                             format_SQL_in_statement(c(497405, 497477, 497486))), 
              duplicates = T, childs = FALSE)

profileApply(f, aqp::checkHzDepthLogic, simplify = F)
Horizon Attributes (first 10 of 62 columns):
------------------------------------------------
    cokey hzID hzdept_r hzdepb_r hzname    chkey texture texcl fragvol_l fragvol_r
 17698786    1        0       15     H1 51648006      cl  <NA>         0         7
 17698786    2       15       74     H2 51648007       c  <NA>         0         4
 17698786    3       15       74     H2 51648007     sic  <NA>         0         4
 17698786    4       74      152     H3 51648008       c  <NA>         0         7

These mapunits have multiple representative values for texture class in some horizons -- which causes some problems for the query fetchSDA uses to get horizon data. This is actually inconsistent with guidelines for our data model. I'll be reporting this issue up the line to get it resolved in the database. It appears the duplication is not in the first horizon, so your analysis will likely not be affected... but this is why we generated the error (now a warning)!

metalaureate commented 4 years ago

No worries. Thanks for the explanation. I’m overjoyed that we fixed it. My soil report coverage increased from 88% to 98%.