Open nicholasjclark opened 1 month ago
Hi Nicholas,
Thank you! I really appreciate your response and advice. I've edited my original post to hopefully make my questions more focused and also to take some of your feedback into account.
I looked into setting the rho and AR.start arguments in bam(). I used a variation of gam3
from my original post:
# First mark the start of each time series as TRUE, and all other data points as FALSE
data %<>%
arrange(GroupName, Year, CycleLine, JulianDate) %>%
as.data.frame()
simdat <- itsadug::start_event(data, column = "JulianDate",
event = c("GroupName", "CycleLine"), # is this correct? Or should I be doing it by Year?
label.event = "Event")
r1 <- itsadug::start_value_rho(gam3, plot = TRUE)
# r1 is 0.8802352
gam5 <- bam(Percent ~ GroupName + CycleLine +
s(JulianDate) +
s(JulianDate, by = GroupName) +
s(JulianDate, by = CycleLine),
data = simdat,
rho = r1,
AR.start = simdat$start.event,
method = "fREML",
family = betar(),
discrete = T,
select = T,
nthreads = 2)
but when I look at the uncorrected vs. 'corrected' residuals I'm still seeing autocorrelation:
# Uncorrected versus corrected residuals:
par(mfrow = c(1, 2), cex = 1.1)
itsadug::acf_resid(gam3)
itsadug::acf_resid(gam5)
enter image description here
No worries if you don't have the bandwidth to answer but I'm curious how can I better account for the autocorrelation? My understanding that rho is just for an AR1 but the results of auto.arima suggest that the process is more complicated? And the r1 value from itsadug::start_value_rho(gam3, plot = TRUE)
doesn't match the output of forecast::auto.arima(residuals(gam3))$coef
.
forecast::auto.arima(residuals(gam3))$coef
produces:
Either way, thanks again and best wishes!
Hi @serena-psc, I suppose that all depends on what the main questions are for this analysis. If you want to capture all of the temporal variation, then simply setting k
to be larger will help to do that. But this will make it more challenging to understand broader average effects of your grouping factors (Cycleline
and GroupName
). Just from looking at a few series in your data, there seem to be some repeated seasonal effects in some of them so I'd say there are periodicities that aren't being captured in your current model. It may be worth looking into those as well
Hi @serena-psc, I started drafting a response to your Cross-Validated post but I have just seen that it was closed. Sorry about that, it seems the moderators want a more focused question from you. But anyway, here are some things you could consider: