ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
79 stars 19 forks source link

fetchSCAN stopped working - SCAN appears to have changed their header format #350

Closed dschlaep closed 2 months ago

dschlaep commented 2 months ago

It appears that the format of the SCAN data has changed (again). The call by soilDB:::.get_SCAN_data() to obtain column headers no longer works correctly. This https://github.com/ncss-tech/soilDB/blob/3ed0bd5b704ce5da5a9e01ff8d399940d9a9974c/R/fetchSCAN.R#L388

now results (after further processing) in h being "California" (instead of a vector of column names).

Then, names(x) <- h sets all but the first name to NULL. And finally, x$Date ends up being NULL which causes the new error "replacement has 0 rows, data has 367".

It appears that skipping 5 instead of 3 lines is now required to obtain column headers (see example code below).

Thanks!

print(packageVersion("soilDB"))
#> [1] '2.8.2'

# this started to fail
x <- try(soilDB::fetchSCAN(site.code = c(356, 2072), year = c(2015, 2016)))
#> Error in `$<-.data.frame`(`*tmp*`, "Date", value = structure(numeric(0), class = "Date")) : 
#>   replacement has 0 rows, data has 367

# narrow it down to this call
req <- list(
  intervalType = " View Historic ",
  report = structure(1L, levels = "SCAN", class = "factor"),
  timeseries = structure(1L, levels = "Daily", class = "factor"),
  format = "copy",
  sitenum = 356,
  interval = "YEAR",
  year = 2015,
  month = "CY"
)

soilDB:::.get_SCAN_data(req)
#> Error in `$<-.data.frame`(`*tmp*`, "Date", value = structure(numeric(0), class = "Date")): replacement has 0 rows, data has 367

# This now results in (here, the first 800 characters via substr(r.content, 1, 800)))
r.content <- "\r\n\r\n\r\n\r\nCalifornia (PST) SNOTEL Site Blue Lakes - NRCS National Water and Climate Center - Provisional Data - subject to revision as of Fri May 17 10:43:08 GMT-08:00 2024. Notes on dates - Daily sensors (e.g. TAVG.D-1) report a summary value for the previous day.  Hourly sensors (e.g. TAVG.H-1) report a summary value for the previous hour.  Instantaneous sensors (e.g. TOBS.I-1) report a single observation on the hour.\r\n\nSite Id,Date,Time,WTEQ.I-1 (in) ,PREC.I-1 (in) ,TOBS.I-1 (degC) ,TMAX.D-1 (degC) ,TMIN.D-1 (degC) ,TAVG.D-1 (degC) ,SNWD.I-1 (in) ,SMS.I-1:-2 (pct)  (silt),SMS.I-1:-8 (pct)  (silt),SMS.I-1:-20 (pct)  (silt),STO.I-1:-2 (degC) ,STO.I-1:-8 (degC) ,STO.I-1:-20 (degC) ,\n356,2015-01-01,,     6.1,     9.1,   -10.9,    -8.9,   -17.0,   -13.1,      21,     0.0,    14.8,    15.8,    "

# and need to skip 5 (instead of 3 lines)
h <- unlist(read.table(
  text = r.content,
  nrows = 1,
  skip = 5,
  header = FALSE,
  stringsAsFactors = FALSE,
  sep = ',',
  quote = '',
  strip.white = TRUE,
  na.strings = '-99.9',
  comment.char = ''
))

print(h)
#>                          V1                          V2 
#>                   "Site Id"                      "Date" 
#>                          V3                          V4 
#>                      "Time"             "WTEQ.I-1 (in)" 
#>                          V5                          V6 
#>             "PREC.I-1 (in)"           "TOBS.I-1 (degC)" 
#>                          V7                          V8 
#>           "TMAX.D-1 (degC)"           "TMIN.D-1 (degC)" 
#>                          V9                         V10 
#>           "TAVG.D-1 (degC)"             "SNWD.I-1 (in)" 
#>                         V11                         V12 
#>  "SMS.I-1:-2 (pct)  (silt)"  "SMS.I-1:-8 (pct)  (silt)" 
#>                         V13                         V14 
#> "SMS.I-1:-20 (pct)  (silt)"         "STO.I-1:-2 (degC)" 
#>                         V15                         V16 
#>         "STO.I-1:-8 (degC)"        "STO.I-1:-20 (degC)" 
#>                         V17 
#>                          NA

Created on 2024-05-17 with reprex v2.1.0

brownag commented 2 months ago

Thanks very much for reporting , reprex, and finding a solution @dschlaep. Your fix works for me and is implemented in 03ba0b1e

dylanbeaudette commented 2 months ago

Thanks for the notification. I'll take a look into it further this afternoon, and let the SCAN folks know the mayhem created by unannounced changes.

dschlaep commented 2 months ago

Many thanks for fixing this so fast!

dylanbeaudette commented 2 months ago

Yes, that was fast! Thanks @brownag