rkillick / changepoint

A place for the development version of the changepoint package on CRAN.
128 stars 33 forks source link

Question: How is changepoint depend on the scale of the data #12

Closed yupenghe closed 7 years ago

yupenghe commented 7 years ago

Hi Rebecca, Thanks for this package. I am trying to detect the changepoints in my data using cpt.mean. However, I found that the scale of the datapoints has large effect on the result (see the dummy example below). I am wondering how I can find the right scale (since I can always scale-up/down the datapoints by a factor). Any suggestions? Thanks.

Yupeng

library(changepoint)
set.seed(1)
x=c(rnorm(100,0,1),rnorm(100,10,1))
cpt.mean(x/100,penalty="Asymptotic",pen.value=0.05,method="PELT") 
cpt.mean(x,penalty="Asymptotic",pen.value=0.05,method="PELT") 
kayleahaynes commented 7 years ago

Hi Yupeng,

The performance of PELT depends on the penalty that you have chosen. Having a penalty value too large for your scale will miss changes which may have been found when the size of the change was larger. For example if you change your example to

cpt.mean((x/100), penalty = "Manual", pen.value=0.1, method = "PELT") cpt.mean(x, penalty = "Manual", pen.value=0.1, method = "PELT")

then the changepoint is detected when x/100 but lots of false changepoints are detected on x. If you are unsure on what penalty to use then look at the "CROPS" penalty. This will find the outputted segmentations for a range of penalties given as an interval. For example.

cpt.mean((x/100), penalty = "CROPS", pen.value=c(0.1,5), method = "PELT")

Hope that helps.

Kaylea

yupenghe commented 7 years ago

Hi Kaylea, Thanks very much! It is very clear. I will try it out.

I actually have another question. I was reading the PELT paper but the cost function is not clear to me when I use penalty = "Manual". I am talking about equation (1) in the paper. Given penalty = "Manual", pen.value=0.1, does it mean the beta in the equation is 0.1?

I have been confused about this problem and just want to check if you happen to have any thoughts about it. Thanks again.

Yupeng

rkillick commented 7 years ago

Hi Yupeng,

In fact, the cpt.mean function is the only one that doesn't scale. This is due to the fact that we are currently using the scaled likelihood to avoid having to calculate the variance (which depends on the changepoint locations). The other cpt.* functions use the full likelihood and are thus scale invariant. Try: cpt.meanvar(x,method="PELT") cpt.meanvar(x/100,method="PELT") cpt.meanvar(x*100,method="PELT")

Regarding the penalty="Manual", you are correct the in equation (1) from the paper, pen.value=0.1 means beta=0.1.

Any further questions, just ask.

Rebecca

QuinnAsena commented 4 years ago

On a related note that brought me here, careful with the scale() function as it returns a matrix and it seems like cpt* doesn't like that as an input :).

### Same data as above
set.seed(1984)
x=c(rnorm(100,0,1),rnorm(100,10,1))

# scaled x
cpt.meanvar(scale(x), method = "PELT")

# me thinking I had the same problem as above and trying "CROPS"
cpt.meanvar(scale(x), penalty = "CROPS", pen.value = c(0.1,5), method = "PELT")

# eureka!
cpt.meanvar(as.numeric(scale(x)), method = "PELT")

Thanks to @rkillick for pointing out that only cpt.mean doesn't scale. Worth a 'class' check on the input?

rkillick commented 4 years ago

@QuinnAsena The cpt.mean does check the class on input. From the documentation:

Data A vector, ts object or matrix containing the data within which you wish to find a changepoint. If data is a matrix, each row is considered a separate dataset.

The problem is that scale returns a matrix which is a single column. Thus cpt.mean see this as n datasets of length 1. You could more easily:

cpt.meanvar(t(scale(x),method='PELT')

QuinnAsena commented 3 years ago

@rkillick Thanks for the response. I missed that in the doccumentation at first (my bad!). I did realise my error but was a little slow on the uptake. Cheers!