Closed yupenghe closed 7 years ago
Hi Yupeng,
The performance of PELT depends on the penalty that you have chosen. Having a penalty value too large for your scale will miss changes which may have been found when the size of the change was larger. For example if you change your example to
cpt.mean((x/100), penalty = "Manual", pen.value=0.1, method = "PELT") cpt.mean(x, penalty = "Manual", pen.value=0.1, method = "PELT")
then the changepoint is detected when x/100 but lots of false changepoints are detected on x. If you are unsure on what penalty to use then look at the "CROPS" penalty. This will find the outputted segmentations for a range of penalties given as an interval. For example.
cpt.mean((x/100), penalty = "CROPS", pen.value=c(0.1,5), method = "PELT")
Hope that helps.
Kaylea
Hi Kaylea, Thanks very much! It is very clear. I will try it out.
I actually have another question. I was reading the PELT paper but the cost function is not clear to me when I use penalty = "Manual"
. I am talking about equation (1) in the paper. Given penalty = "Manual", pen.value=0.1
, does it mean the beta
in the equation is 0.1?
I have been confused about this problem and just want to check if you happen to have any thoughts about it. Thanks again.
Yupeng
Hi Yupeng,
In fact, the cpt.mean
function is the only one that doesn't scale. This is due to the fact that we are currently using the scaled likelihood to avoid having to calculate the variance (which depends on the changepoint locations). The other cpt.*
functions use the full likelihood and are thus scale invariant.
Try:
cpt.meanvar(x,method="PELT")
cpt.meanvar(x/100,method="PELT")
cpt.meanvar(x*100,method="PELT")
Regarding the penalty="Manual"
, you are correct the in equation (1) from the paper, pen.value=0.1
means beta=0.1
.
Any further questions, just ask.
Rebecca
On a related note that brought me here, careful with the scale()
function as it returns a matrix and it seems like cpt*
doesn't like that as an input :).
### Same data as above
set.seed(1984)
x=c(rnorm(100,0,1),rnorm(100,10,1))
# scaled x
cpt.meanvar(scale(x), method = "PELT")
# me thinking I had the same problem as above and trying "CROPS"
cpt.meanvar(scale(x), penalty = "CROPS", pen.value = c(0.1,5), method = "PELT")
# eureka!
cpt.meanvar(as.numeric(scale(x)), method = "PELT")
Thanks to @rkillick for pointing out that only cpt.mean
doesn't scale. Worth a 'class' check on the input?
@QuinnAsena The cpt.mean does check the class on input. From the documentation:
Data A vector, ts object or matrix containing the data within which you wish to find a changepoint. If data is a matrix, each row is considered a separate dataset.
The problem is that scale returns a matrix which is a single column. Thus cpt.mean see this as n datasets of length 1. You could more easily:
cpt.meanvar(t(scale(x),method='PELT')
@rkillick Thanks for the response. I missed that in the doccumentation at first (my bad!). I did realise my error but was a little slow on the uptake. Cheers!
Hi Rebecca, Thanks for this package. I am trying to detect the changepoints in my data using cpt.mean. However, I found that the scale of the datapoints has large effect on the result (see the dummy example below). I am wondering how I can find the right scale (since I can always scale-up/down the datapoints by a factor). Any suggestions? Thanks.
Yupeng