zhulabgroup / phenology-website

1 stars 1 forks source link

Better method for gap filling and smoothing #33

Open yiluansong opened 3 weeks ago

yiluansong commented 3 weeks ago

https://github.com/zhulabgroup/phenology-website/blob/1dbda1bb23578a76303f161bb15054ad3d304148/phenowatch/app_local_all_years_slider.R#L273-L295

I have written a function for gap-filling and smoothing (both done in one step). Please try to replace this bit of the code with the new function. This should also greatly simplify the gap-filling and smoothing by year: You just need to group the dataset by year and apply this function without the need to iterate.

#' Whittaker smoothing wrapper
#'
#' This function performs Whittaker smoothing based on ptw:whit1. ptw::whit1 does not take NA in the time series. This function takes a time series with NA or placeholder value (e.g., -9999). Weights at placeholder should be set to 0. NA are skipped, while values at placeholders are interpolated.
#'
#' @param x signal to be smoothed: a vector
#' @param maxgap Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions.
#' @param lambda smoothing parameter: larger values lead to more smoothing
#' @param minseg Minimum length of segments to perform Whittaker smoothing on. Default setting is to 2.
#'
#' @return smoothed signal: a vector
#'
#' @examples
#' # Example:
#' data.frame(
#'   doy = 1:365, # simulate some data
#'   evi = sin((1:365) / 365 * pi)
#' ) %>%
#'   filter(doy <= 20 | doy >= 40) %>% # make some short segments of data missing
#'   filter(doy <= 40 | doy >= 80) %>% # make some long segments of data missing
#'   complete(doy = seq(1, 365, by = 1)) %>%
#'   mutate(evi_sm = util_fill_whit(x = evi, maxgap = 14, lambda = 50, minseg = 2)) %>% # weighted whittaker smoothing allowing gaps %>%
#'   ggplot() +
#'   geom_line(aes(x = doy, y = evi_sm), col = "blue")
#' @export

util_fill_whit <- function(x, maxgap = Inf, lambda, minseg = 2) {
  x_fill <- imputeTS::na_replace(x, fill = -9999, maxgap = maxgap) # fill short gaps with -9999 placeholder
  w <- (x_fill != -9999) # weight = 0 at long gaps, weight = 1 at short gaps
  x_sm <- util_whit(x = x_fill, lambda = lambda, w = w, minseg = minseg)

  return(x_sm)
}

util_whit <- function(x, lambda, w, minseg = 2) {
  max_id <- 0
  done <- F
  while (!done) {
    v_non_na <- which(!is.na(x[(max_id + 1):length(x)])) # non-NA segment
    if (length(v_non_na) == 0) { # all numbers are NA
      done <- T # consider this ts done
    } else {
      min_id <- min(v_non_na) + (max_id) # first number that is not NA
      v_na <- which(is.na(x[min_id:length(x)])) # NA segment
      if (length(v_na) == 0) { # no more NA
        max_id <- length(x) # last non-NA segment is at the end of the whole ts
        done <- T # consider this ts done
      } else {
        max_id <- min(v_na) - 1 + (min_id - 1) # index of last number in this NA segment
      }
      if (max_id - min_id + 1 < minseg) { # this non-NA segment is too short
        x[min_id:max_id] <- -9999
      } else {
        x[min_id:max_id] <- ptw::whit1(x[min_id:max_id], lambda, w[min_id:max_id]) # whittaker smoothing for this non-NA segment
      }
    }
  }
  x[x == -9999] <- NA
  return(x)
}