Closed phytoclast closed 11 months ago
I'll look into this today. I was working on this a little while ago and have a solution.
On Thu, Oct 5, 2023 at 2:12 PM Greg Schmidt @.***> wrote:
I am trying to extract and flatten the geomorphic and parent material child tables from a 2021 geodatabase version of SSURGO. MI = fetchGDB(dsn = 'D:/GIS/SOIL/2021/gSSURGO_MI.gdb', childs = TRUE)
However, I get this error:
getting components and horizons from Error in .get_cogeomordesc_from_GDB(dsn = dsn) : argument "co" is missing, with no default
This error does not occur when "childs" parameter is left FALSE by default. The "get_component_from_GDB" function has the same error when setting "childs = TRUE".
— Reply to this email directly, view it on GitHub https://github.com/ncss-tech/soilDB/issues/311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACILCJWOOUYCRPG4P6ABOFLX54BBLAVCNFSM6AAAAAA5UXOF3SVHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDQOJQGE3TENY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
There is a line:
cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn)
Missing the 'co'. So I replaced it.
cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn, co)
The next problem is it running out of memory:
getting components and horizons from Error: Query execution failed, cannot open layer. In addition: Warning message: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL Error 1: SQL Expression Parsing Error: memory exhausted. Occurred around : 17145','21117146','21117147','21117148','21117149','21117150','21117151','211171
So I inserted a couple loops to pace itself into .get_cogeomordesc_from_GDB and .get_copmgrp_from_GDB to limit itself to blocks of 2000 component records at a time, which was successful.
.get_cogeomordesc_from_GDB <- function(dsn = dsn, co) {
ron <- rownames(co)
nblocks <- floor(length(ron)/2000)+1
for(i in 1:nblocks){
recs <- ron[c((2000*(i-1)):(2000*(i)))]
recs <- recs[!is.na(recs)]
co0 <- co[recs,]
qry <- paste0(
"SELECT cokey, geomftname, geomfname, geomfeatid, existsonfeat, cogeomdkey
FROM cogeomordesc
WHERE rvindicator = 'Yes' AND
cokey IN ", format_SQL_in_statement(co0$cokey)
)
...skipping the middle...
# merge results
cogmd <- merge(cogmd_ls[c("cokey", "landscape")], cogmd_lf, by = "cokey", all.x = TRUE, sort = FALSE)
cogmd <- merge(cogmd, lf_3d, by = "cogeomdkey", all.x = TRUE, sort = FALSE)
cogmd <- merge(cogmd, lf_ss, by = "cogeomdkey", all.x = TRUE, sort = FALSE)
cogmd <- merge(cogmd, lf_2d, by = "cogeomdkey", all.x = TRUE, sort = FALSE)
cogmd$cogeomdkey <- NULL
if(i==1){cogmd1=cogmd}else{cogmd1=rbind(cogmd1,cogmd)}}
return(cogmd1)
}
But the rest of it failed when it got to a function that it couldn't find...
Error in .copm_prep(copm, db = "SDA") : could not find function ".copm_prep"
There are two functions referenced within get_component_from_GDB that I cannot find defined anywhere in SoilDB:
# prep
copm <- .copm_prep(copm, db = "SDA")
cogmd <- .cogmd_prep(cogmd, db = "SDA")
So I was stuck.... But omitting those lines seems to allow me to move on, and it now shows the landforms in the site table.
I just posted an update of those functions to GitHub. Let me know if you run into problems.
Querying the data via sf::read_sf() is limiting in a number of ways. Like you discovered it'll only allow you filter by a certain number of cokeys or other (e.g. cokey IN (...)). For big file geodatabases filtering can save time, if it's a big list of
Flattening the geomorphic table in particular is a headache. It's very deeply nested and allows a lot of creativity when populating (e.g. existson column).
The .coXX_prep() functions are in the utils.R file in soilDB.
Stephen
On Thu, Oct 12, 2023 at 11:06 AM Greg Schmidt @.***> wrote:
There is a line: cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn) Missing the 'co'. So I replaced it. cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn, co)
The next problem is it running out of memory:
getting components and horizons from Error: Query execution failed, cannot open layer. In addition: Warning message: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL Error 1: SQL Expression Parsing Error: memory exhausted. Occurred around :
17145','21117146','21117147','21117148','21117149','21117150','21117151','211171
So I inserted a couple loops to pace itself into .get_cogeomordesc_from_GDB and .get_copmgrp_from_GDB to limit itself to blocks of 2000 component records at a time, which was successful.
.get_cogeomordesc_from_GDB <- function(dsn = dsn, co) { ron <- rownames(co) nblocks <- floor(length(ron)/2000)+1 for(i in 1:nblocks){ recs <- ron[c((2000(i-1)):(2000(i)))] recs <- recs[!is.na(recs)] co0 <- co[recs,]
qry <- paste0( "SELECT cokey, geomftname, geomfname, geomfeatid, existsonfeat, cogeomdkey FROM cogeomordesc WHERE rvindicator = 'Yes' AND cokey IN ", format_SQL_in_statement(co0$cokey) )
...skipping the middle...
merge results
cogmd <- merge(cogmd_ls[c("cokey", "landscape")], cogmd_lf, by = "cokey", all.x = TRUE, sort = FALSE) cogmd <- merge(cogmd, lf_3d, by = "cogeomdkey", all.x = TRUE, sort = FALSE) cogmd <- merge(cogmd, lf_ss, by = "cogeomdkey", all.x = TRUE, sort = FALSE) cogmd <- merge(cogmd, lf_2d, by = "cogeomdkey", all.x = TRUE, sort = FALSE) cogmd$cogeomdkey <- NULL if(i==1){cogmd1=cogmd}else{cogmd1=rbind(cogmd1,cogmd)}} return(cogmd1) }
But the rest of it failed when it got to a function that it couldn't find...
Error in .copm_prep(copm, db = "SDA") : could not find function ".copm_prep"
There are two functions referenced within get_component_from_GDB that I cannot find defined anywhere in SoilDB:
prep
copm <- .copm_prep(copm, db = "SDA") cogmd <- .cogmd_prep(cogmd, db = "SDA")
So I am stuck.
— Reply to this email directly, view it on GitHub https://github.com/ncss-tech/soilDB/issues/311#issuecomment-1759926138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACILCJV7ZF67OMEYOC4QZPTX7AIPPANCNFSM6AAAAAA5UXOF3Q . You are receiving this because you commented.Message ID: @.***>
Excellent, Stephen, thanks! Your fix took less than half the time as my work around. It took 1.27 minutes to extract and flatten Michigan's GDB (vs. 2.6 minutes). I am not exactly sure on the differences in the output, but I will re-extract the rest of my targeted states.
@smroecker great work on improving the efficiency of these functions, I am going to close this as I think all of the problems have been resolved.. I have confirmed childs=TRUE
works as expected on CONUS gNATSGO.
I am trying to extract and flatten the geomorphic and parent material child tables from a 2021 geodatabase version of SSURGO. MI = fetchGDB(dsn = 'D:/GIS/SOIL/2021/gSSURGO_MI.gdb', childs = TRUE)
However, I get this error:
This error does not occur when "childs" parameter is left FALSE by default. The "get_component_from_GDB" function has the same error when setting "childs = TRUE".