ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

fetchGDB child tables #311

Closed phytoclast closed 11 months ago

phytoclast commented 1 year ago

I am trying to extract and flatten the geomorphic and parent material child tables from a 2021 geodatabase version of SSURGO. MI = fetchGDB(dsn = 'D:/GIS/SOIL/2021/gSSURGO_MI.gdb', childs = TRUE)

However, I get this error:

getting components and horizons from Error in .get_cogeomordesc_from_GDB(dsn = dsn) : argument "co" is missing, with no default

This error does not occur when "childs" parameter is left FALSE by default. The "get_component_from_GDB" function has the same error when setting "childs = TRUE".

smroecker commented 1 year ago

I'll look into this today. I was working on this a little while ago and have a solution.

On Thu, Oct 5, 2023 at 2:12 PM Greg Schmidt @.***> wrote:

I am trying to extract and flatten the geomorphic and parent material child tables from a 2021 geodatabase version of SSURGO. MI = fetchGDB(dsn = 'D:/GIS/SOIL/2021/gSSURGO_MI.gdb', childs = TRUE)

However, I get this error:

getting components and horizons from Error in .get_cogeomordesc_from_GDB(dsn = dsn) : argument "co" is missing, with no default

This error does not occur when "childs" parameter is left FALSE by default. The "get_component_from_GDB" function has the same error when setting "childs = TRUE".

— Reply to this email directly, view it on GitHub https://github.com/ncss-tech/soilDB/issues/311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACILCJWOOUYCRPG4P6ABOFLX54BBLAVCNFSM6AAAAAA5UXOF3SVHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDQOJQGE3TENY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

phytoclast commented 11 months ago

There is a line: cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn) Missing the 'co'. So I replaced it. cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn, co)

The next problem is it running out of memory:

getting components and horizons from Error: Query execution failed, cannot open layer. In addition: Warning message: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL Error 1: SQL Expression Parsing Error: memory exhausted. Occurred around : 17145','21117146','21117147','21117148','21117149','21117150','21117151','211171

So I inserted a couple loops to pace itself into .get_cogeomordesc_from_GDB and .get_copmgrp_from_GDB to limit itself to blocks of 2000 component records at a time, which was successful.

.get_cogeomordesc_from_GDB <- function(dsn = dsn, co) {
  ron <- rownames(co)
  nblocks <- floor(length(ron)/2000)+1
  for(i in 1:nblocks){
    recs <- ron[c((2000*(i-1)):(2000*(i)))]
    recs <- recs[!is.na(recs)]
    co0 <- co[recs,]

    qry <- paste0(
      "SELECT cokey, geomftname, geomfname, geomfeatid, existsonfeat, cogeomdkey

      FROM cogeomordesc

      WHERE rvindicator = 'Yes' AND
            cokey IN ", format_SQL_in_statement(co0$cokey)
    )

...skipping the middle...

# merge results
  cogmd <- merge(cogmd_ls[c("cokey", "landscape")], cogmd_lf, by = "cokey", all.x = TRUE, sort = FALSE)
  cogmd <- merge(cogmd, lf_3d, by = "cogeomdkey", all.x = TRUE, sort = FALSE)
  cogmd <- merge(cogmd, lf_ss, by = "cogeomdkey", all.x = TRUE, sort = FALSE)
  cogmd <- merge(cogmd, lf_2d, by = "cogeomdkey", all.x = TRUE, sort = FALSE)
  cogmd$cogeomdkey <- NULL
  if(i==1){cogmd1=cogmd}else{cogmd1=rbind(cogmd1,cogmd)}}
  return(cogmd1)
}

But the rest of it failed when it got to a function that it couldn't find...

Error in .copm_prep(copm, db = "SDA") : could not find function ".copm_prep"

There are two functions referenced within get_component_from_GDB that I cannot find defined anywhere in SoilDB:

  # prep
    copm  <- .copm_prep(copm, db = "SDA")
    cogmd <- .cogmd_prep(cogmd, db = "SDA")

So I was stuck.... But omitting those lines seems to allow me to move on, and it now shows the landforms in the site table.

smroecker commented 11 months ago

I just posted an update of those functions to GitHub. Let me know if you run into problems.

Querying the data via sf::read_sf() is limiting in a number of ways. Like you discovered it'll only allow you filter by a certain number of cokeys or other (e.g. cokey IN (...)). For big file geodatabases filtering can save time, if it's a big list of

Flattening the geomorphic table in particular is a headache. It's very deeply nested and allows a lot of creativity when populating (e.g. existson column).

The .coXX_prep() functions are in the utils.R file in soilDB.

Stephen

On Thu, Oct 12, 2023 at 11:06 AM Greg Schmidt @.***> wrote:

There is a line: cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn) Missing the 'co'. So I replaced it. cogmd <- .get_cogeomordesc_from_GDB(dsn = dsn, co)

The next problem is it running out of memory:

getting components and horizons from Error: Query execution failed, cannot open layer. In addition: Warning message: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL Error 1: SQL Expression Parsing Error: memory exhausted. Occurred around :

17145','21117146','21117147','21117148','21117149','21117150','21117151','211171

So I inserted a couple loops to pace itself into .get_cogeomordesc_from_GDB and .get_copmgrp_from_GDB to limit itself to blocks of 2000 component records at a time, which was successful.

.get_cogeomordesc_from_GDB <- function(dsn = dsn, co) { ron <- rownames(co) nblocks <- floor(length(ron)/2000)+1 for(i in 1:nblocks){ recs <- ron[c((2000(i-1)):(2000(i)))] recs <- recs[!is.na(recs)] co0 <- co[recs,]

qry <- paste0(
  "SELECT cokey, geomftname, geomfname, geomfeatid, existsonfeat, cogeomdkey

  FROM cogeomordesc

  WHERE rvindicator = 'Yes' AND
        cokey IN ", format_SQL_in_statement(co0$cokey)
)

...skipping the middle...

merge results

cogmd <- merge(cogmd_ls[c("cokey", "landscape")], cogmd_lf, by = "cokey", all.x = TRUE, sort = FALSE) cogmd <- merge(cogmd, lf_3d, by = "cogeomdkey", all.x = TRUE, sort = FALSE) cogmd <- merge(cogmd, lf_ss, by = "cogeomdkey", all.x = TRUE, sort = FALSE) cogmd <- merge(cogmd, lf_2d, by = "cogeomdkey", all.x = TRUE, sort = FALSE) cogmd$cogeomdkey <- NULL if(i==1){cogmd1=cogmd}else{cogmd1=rbind(cogmd1,cogmd)}} return(cogmd1) }

But the rest of it failed when it got to a function that it couldn't find...

Error in .copm_prep(copm, db = "SDA") : could not find function ".copm_prep"

There are two functions referenced within get_component_from_GDB that I cannot find defined anywhere in SoilDB:

prep

copm <- .copm_prep(copm, db = "SDA") cogmd <- .cogmd_prep(cogmd, db = "SDA")

So I am stuck.

— Reply to this email directly, view it on GitHub https://github.com/ncss-tech/soilDB/issues/311#issuecomment-1759926138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACILCJV7ZF67OMEYOC4QZPTX7AIPPANCNFSM6AAAAAA5UXOF3Q . You are receiving this because you commented.Message ID: @.***>

phytoclast commented 11 months ago

Excellent, Stephen, thanks! Your fix took less than half the time as my work around. It took 1.27 minutes to extract and flatten Michigan's GDB (vs. 2.6 minutes). I am not exactly sure on the differences in the output, but I will re-extract the rest of my targeted states.

brownag commented 11 months ago

@smroecker great work on improving the efficiency of these functions, I am going to close this as I think all of the problems have been resolved.. I have confirmed childs=TRUE works as expected on CONUS gNATSGO.