ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
83 stars 19 forks source link

fetchHenry() NA-padding for weekly / monthly granularity #265

Open dylanbeaudette opened 2 years ago

dylanbeaudette commented 2 years ago

TODO:

Further research: https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates

First approximation here.


.fillMissingGran <- function(x, gran) {

  ## TODO this doesn't account for leap-years
  # 366 days
  # 53 weeks

  # sequence of possible values
  g.vect <- switch(
    gran,
    'day' = 1:365,
    'week' = 1:52,
    'month' = 1:12
  )

  # column to use
  # week / month_numeric are missing
  g.col <- switch(
    gran,
    'day' = 'doy',
    'week' = 'week',
    'month' = 'month_numeric'
  )

  # format string
  g.fmt <- switch(
    gran,
    'day' = '%Y %j %H:%M',
    'week' = '%Y %W %H:%M',
    'month' = '%Y %m %H:%M'
  )

  # add time ID columns as-needed
  # doi is always present

  ## "week" not as simple as it seems
  # https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates

  # week
  if(gran == 'week') {
    x$week <- as.integer(format(x$date_time, '%W'))
  }

  # month
  if(gran == 'month') {
    x$month_numeric <- as.integer(format(x$date_time, '%m'))
  }

  # ID missing time IDs
  missing <- which(is.na(match(g.vect, x[[g.col]])))

  # short-circuit
  if (length(missing) < 1) {
    return(x)
  }

  # make fake date-times for missing time IDs
  fake.datetimes <- paste0(x$year[1], ' ', missing, ' 00:00')

  # TODO: this will result in timezone specific to locale; 
  #  especially an issue when granularity is less than daily or for large extents
  fake.datetimes <- as.POSIXct(fake.datetimes, format = g.fmt)

  # generate DF with missing information
  fake.data <- data.frame(
    sid = x$sid[1],
    date_time = fake.datetimes, 
    year = x$year[1],
    doy = missing.days, 
    month = format(fake.datetimes, "%b")
  )

  fill.cols <- which(!colnames(x) %in% colnames(fake.data))
  if (length(fill.cols) > 0) {
    na.data <- as.data.frame(x)[, fill.cols, drop = FALSE][0,, drop = FALSE][1:nrow(fake.data),, drop = FALSE]
    fake.data <- cbind(fake.data, na.data)
  }

  # make datatypes for time match
  x$date_time <- as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")

  # splice in missing data
  y <- rbind(x, fake.data)

  # re-order by DOY and return
  return(y[order(y$doy), ])
}

# generate example data
w <- fetchHenry(project = 'CA790', gran = 'week', soiltemp.summaries = FALSE, pad.missing.days = TRUE)

x <- w$soiltemp[w$soiltemp$sid == 392 & w$soiltemp$year == '1998', ]

plot(x$date_time, x$sensor_value, type = 'p')

.fillMissingGran(x, gran = 'week')
brownag commented 2 years ago

A note to extend methods where possible so that they can work with other data sources e.g. SCAN, CDEC

brownag commented 2 years ago

Looks like we will also need to change the usage of base::as.POSIXct() format argument in soilDB:::.fill_missing_days() as it is breaking with R devel.

══ Failed tests ════════════════════════════════════════════════════════════════
── Error (test-fetchHenry.R:122:3): summarizeSoilTemperature() works as expected ──
Error in `.POSIXct(x, tz, ...)`: unused argument (format = "%Y-%m-%d %H:%M:%S")
Backtrace:
    ▆
 1. ├─soilDB:::.formatDates(x, gran = "day", pad.missing.days = TRUE) at test-fetchHenry.R:122:2
 2. │ ├─...[]
 3. │ └─data.table:::`[.data.table`(...)
 4. └─soilDB:::.fill_missing_days(.SD)
 5.   ├─base::as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
 6.   └─base::as.POSIXct.default(x$date_time, format = "%Y-%m-%d %H:%M:%S")
── Error (test-fetchHenry.R:165:3): .fill_missing_days() works as expected ─────
Error in `.POSIXct(x, tz, ...)`: unused argument (format = "%Y-%m-%d %H:%M:%S")
Backtrace:
    ▆
 1. └─soilDB:::.fill_missing_days(x) at test-fetchHenry.R:165:2
 2.   ├─base::as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
 3.   └─base::as.POSIXct.default(x$date_time, format = "%Y-%m-%d %H:%M:%S")
dylanbeaudette commented 2 years ago

I'll try to take a look next week sometime, unless you have time before then. Can you tackle the POSIX thing?

brownag commented 2 years ago

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Can you tackle the POSIX thing?

This is sorted w/ https://github.com/ncss-tech/soilDB/commit/6d4c02b553b52f67ffd4b0da9d8ae15c2c9ad0f4 as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

dylanbeaudette commented 2 years ago

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Go for it if you have some time. I'm not going to have enough time this week.

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

Thanks, the as.Date( fix was news to me.

dylanbeaudette commented 2 years ago

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Go for it if you have some time. I'm not going to have enough time this week.

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

Thanks, the as.Date( fix was news to me.