Open stephens999 opened 6 years ago
I agree that this is an undesirable "feature" but I haven't decided on a suitable "default" behaviour for estimating the variance in the presence of a mean change. If you have any suggestions, please let me know.
i'd suggest 0.5*median(abs(diff(y))/0.6745)^2
which is based on the fact that
yt-y{t+1} \sim N(0,2sigma^2) if there is no difference in mean between t and t+1
and E(median(abs(N(0,sigma^2)))) = 0.6745 sigma
Thanks for the suggestion, i'll see how it behaves on different simulated data. The best one i've found at the moment is a sample variance on a rolling window of length 30 and then taking the median of those. But this is undesirable when you have long time series.
I noticed that the scaling of the data matters, which seems undesirable (and unnecessary).
For example:
The
cpt.mean
default does not find any changepoints:But if we multiply the data by 10 we find many changepoints.
I speculate that perhaps the cost function (log-likelihood) implicitly assumes the variance is 1?
Incidentally to this, while digging around the code to see if I could understand the issue, I noticed that some places in the code use "norm.mean" whereas others use "mean.norm". I'm not sure that was intended?