rkillick / changepoint

A place for the development version of the changepoint package on CRAN.
127 stars 33 forks source link

binseg returns incorrect segment means #47

Open tdhock opened 3 years ago

tdhock commented 3 years ago

hi @rkillick I'm trying to get the segment means computed by binary segmentation, which appear to be incorrect below.

> changepoint::cpt.mean(c(1,2,4), penalty="Manual", method="BinSeg", pen.value=0, Q=1)@param.est
$mean
[1] 1.000000 2.333333
# I expected 1.5, 4
> changepoint::cpt.mean(c(1,2,4), penalty="Manual", method="BinSeg", pen.value=0, Q=2)@param.est
$mean
[1] 1.000000 1.000000 2.333333
# I expected 1,2,4
> changepoint::cpt.mean(c(1,2,4), penalty="Manual", method="BinSeg", pen.value=0, Q=3)@param.est
$mean
[1] 1.000000 1.000000 1.000000 2.333333
# I expected error because there can not be Q=3 changepoints in 3 data points.
> 
rkillick commented 2 years ago

Due to other fixes, the last statement errors now.

For the first two there was a bug which added an extra 0 to the start of the changepoint list which is why all of them start with 1. For the first example it should just be a mean of 2.33333 as there is 1 segment so mean(c(1,2,4)) is 2.33333. Then the second example does have a bug as the cpts.full shows that for 1 changepoint it gives an NA value in the table. Thus the best segmentation is still no changepoints but it reports c(0,0,3) as the changepoint locations which is clearly incorrect.

I will need to look further into this bug after the current release (2.2.3) as it only seems to happen with small datasets.

tdhock commented 2 years ago

Hi @rkillick thanks for the update. I confirm an error for the third now, but the first two issues persist.

> changepoint::cpt.mean(c(1,2,4), penalty="Manual", method="BinSeg", pen.value=0, Q=1)@param.est
$mean
[1] 1.000000 2.333333

Warning message:
In BINSEG(sumstat, pen = pen.value, cost_func = costfunc, minseglen = minseglen,  :
  The number of changepoints identified is Q, it is advised to increase Q to make sure changepoints have not been missed.
> changepoint::cpt.mean(c(1,2,4), penalty="Manual", method="BinSeg", pen.value=0, Q=2)@param.est
$mean
[1] 1.000000 1.000000 2.333333

Warning message:
In BINSEG(sumstat, pen = pen.value, cost_func = costfunc, minseglen = minseglen,  :
  The number of changepoints identified is Q, it is advised to increase Q to make sure changepoints have not been missed.
> changepoint::cpt.mean(c(1,2,4), penalty="Manual", method="BinSeg", pen.value=0, Q=3)@param.est
Error in BINSEG(sumstat, pen = pen.value, cost_func = costfunc, minseglen = minseglen,  : 
  Q is larger than the maximum number of segments 2.5 
rkillick commented 2 years ago

I haven't committed the version to github yet!

sanjmeh commented 1 year ago

Is this R package under active development and maintenance? Can the package author please upddate old issues and bring them to a logical end? Alternatively if the package author recommends there is another package that has superceded this package, it will be nice we know that. Thanks.

rkillick commented 1 year ago

Yes this package is under active development and maintenance. As you can imagine, when someone is an academic and covid hits then other tasks need to take priority. This is unfortunately what happens when packages are maintained by volunteers and not paid staff. We have been working on a new major release and so have encorporated these fixes into that. The new term has just started so we are distracted by that but are hoping to have it out early 2023. As stated above, this only happens for very small datasets (toy examples rather than things we encounter in reality if you will) and so isn't a priority for a patch fix.

If you would like to submit a patch fix we would be happy to accept the pull request for others to benefit.