srlanalytics / bdfm

Bayesian dynamic factor model estimation and predictive statistics including nowcasting and forecasting
MIT License
5 stars 6 forks source link

Heuristics: Should we take logs, differentiate? #32

Closed christophsax closed 5 years ago

christophsax commented 5 years ago

I like his kind of automated checks that you added to seas_we. They should: Apply a reasonable heuristic and tell the user what they got. Would like to have them separated, so we can use them both in dfm() and seas_we()

They will make it much easier for first time users.

should_log <- function(x) {
  tmp <- diff(x, differences = 1)
  tmp_var  <- var(tmp, na.rm = TRUE)/length(x)
  tmp_mean <- mean(tmp, na.rm = TRUE)
  #T/F - take logs if mean diff is significantly different (one s.d.) from zero AND
  #      fewer than 5% of observations are less than zero (i.e. due to errors in data)
  # Properly we should use the second diff, but the financial crises messes that up.
  tmp_mean/sqrt(tmp_var)>1 && sum(x<0)/length(x) < 0.05
}

should_diff <- function(x) {
  # todo, any idea?
  # forecast::auto.arima() does this
}
srlanalytics commented 5 years ago

Sorry for the delay getting this cleaned up. Should be OK now... arguments in dfm() now include
scale = TRUE, logs = NULL, diffs = NULL, frequency_mix = "auto", pre_differenced = NULL so that data is automatically scaled and then unscaled (if TRUE), logs are taken for specified series (accepting either series name or index value), differenced if specified (again, series name or index value). pre-differenced is only for mixed frequency data --- if low frequency data is already differenced it needs to be included in pre_differenced to get the correct bridge equation. Phew. Hopefully it's pretty clean.

christophsax commented 5 years ago

nice! will look at it during the weekend

christophsax commented 5 years ago

This looks fine and seem to work