Closed juniperlsimonis closed 6 years ago
Weird.
The most straightforward explanation would be that accepts
has one or more NAs, which would probably mean that proposed_lls
, lls
, or betas
has NAs. But I don't know how that would happen, or why it would only happen early on.
If it only happens early on, we might be able to try a bunch of random seeds until we find one that consistently produces the problem. Alternatively, it might be possible to figure something out using options(error = recover)
before triggering the error so we can see what things looked like before the error got thrown.
When you say it happens "early," could it be happening on the very first iteration? If so, maybe it sometimes gets initialized in an invalid state?
I think it's likely the initialization. When that error gets thrown, I don't think I've seen any of the progress bar printed yet.
I also get the impression that it's more likely to happen with the higher number of change points for the less frequent samples, which are quite low likelihood models.
Okay, in that case, my current theory is that it's this line: https://github.com/weecology/LDATS/blob/master/R/changepointmodel.r#L202
If I'm right, then the error happens when we accidentally start with two changepoints right on top of each other with time in between.
Adding replace = TRUE
to the sampling statement should solve the problem.
Word!
so, you're saying
replicate(N_temps, sort(sample.int(length(x$year_continuous, replace = T), n_changepoints)))
or in a different spot?
Almost. I think your version would give replace
as an argument to length
, instead of sample.int
. This should work though:
replicate(N_temps, sort(sample.int(length(x$year_continuous), n_changepoints, replace = TRUE)))
(side note, I always try to use TRUE and FALSE instead of T and F because T and F can be overwritten and cause tricky bugs)
ack, yes, sorry, I put that one parenthesis early, yeah. good point re: T/F. I always avoid using them as variable names, but good to be aware that others might not, and THANKS R for letting that happen without so much as a warning
I had trouble double-checking because
replicate(N_temps, sort(sample.int(length(
has so many open parentheses. Maybe it should be split into two lines.
probably, but it works well! (the code in general, that is! it's well written and pretty dang efficient and easy to follow!) i updated that function and we'll see if that fixes the issue. it wasn't a consistent error, so it has to have something to do with a random variable, and the sample would fit the bill.
it was just 😭 because i wrote a wrapper function to do all of the 5 change point models in a single call and it would get 4/5 of the way through and then throw the error. ⏳ ⏳ 🚫
Another run threw the same error with the updated function. :/
Maybe try printing the changepoints just before the loop starts. And the initial log likelihoods. Hopefully it’ll be clear what kind of misbehavior is going on.
Dave
On Dec 18, 2017, at 00:01, Juniper Simonis notifications@github.com wrote:
Another run threw the same error with the updated function. :/
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
yeah I'll definitely source it better when I've got some time
Sorry, I’m apparently not very awake. We want replace=FALSE to avoid repeats. My “fix” would have made it worse.
Dave
On Dec 18, 2017, at 00:30, Juniper Simonis notifications@github.com wrote:
yeah I'll definitely source it better when I've got some time
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
ah! no worries! it's late there!
also, that makes more sense to me, but i hadn't worked through the logic of that code yet, so i wasn't sure.
Unfortunately that didn't work.
However, I think I have the problem sourced:
The get_ll
function can return a -Inf result. This isn't normally an issue for the proposal likelihoods, as -Inf is less than everything, so that's obviously a worse choice than the current likelihood. However, when the initial likelihood is -Inf, then we get a logical comparison of -Inf and -Inf, which is NaN, and creates an NA value for one of the accepts elements
so far, the only possible change point locations that generate -Inf likelihood results in the initial change point location assignment is a location equal to time series length (last time point is a change point)
we should be able to fix this by editing that initial change point locator to
replicate(N_temps, sort(sample.int(length(x$year_continuous) - 1, n_changepoints, replace = TRUE)))
yes? (inserting the - 1
)
also it looks like the proposed change points can actually venture outside of the bounds of the actual time series (including negative times and times after the series ends). is that acceptable or is there something that needs to happen to limit that? [it has to do with how the "kicks" work, which I'm still wrapping my head around, so I'm not sure if this is an issue or something we want]
Thanks for tracking that down!
I believe your fix is correct (except that it still has the replace=TRUE
that I incorrectly suggested). We do set the log-likelihood to -Inf
when a changepoint is too high, and think the section I linked to is correct in doing so.
Proposing a move outside the support of the posterior distribution is completely okay: just treat it as a proposal with zero likelihood (i.e. reject it) and you still have a valid Markov chain. Alternatively, we could adjust the proposal distribution at the edges to avoid this situation, but it's very easy to do that step incorrectly so it's probably not a good idea here.
whoops, yes, I copied that code text from above from before we had that edit.
word on the -Inf application when it's out of bounds. I think that's a really good approach to take as a first cut, and yes, doing more definitions on the boundaries would be optimal in some senses, it could cause a lot more headaches
looks like that was it! thanks for your help tracking this down!
While running the change point model, I'll occasionally get errors thrown that say "Error in changepoints[, accepts] <- proposed_changepoints[, accepts] : NAs are not allowed in subscripted assignments"
They tend to happen early in the run of the model.