rkillick / changepoint

A place for the development version of the changepoint package on CRAN.
127 stars 33 forks source link

cpt.meanvar returns an extra changepoint location when locations are called directly #56

Closed MichelineCampbell closed 3 years ago

MichelineCampbell commented 3 years ago

Hi, Thanks for making changepoint a reality - I'm working on incorporating CP analyses into an automated workflow, and have run into a hurdle where the changepoint object returns one more changepoint locations than the list dat@cpts (which I'd like to use to be able to call the locations directly). Do you know if cpt.meanvar will always return the last x value as a changepoint? If so I can just drop the last one off :)

Some test data and what I found below:

dat <- structure(list(year = c(1L, 10L, 12L, 13L, 15L, 17L, 18L, 21L, 
                        23L, 27L, 28L, 30L, 31L, 32L, 34L, 36L, 37L, 43L, 44L, 45L, 46L, 
                        48L, 54L, 56L, 57L, 58L, 59L, 60L, 63L, 64L, 65L, 66L, 67L, 69L, 
                        72L, 73L, 75L, 77L, 79L, 80L, 81L, 82L, 83L, 88L, 89L, 93L, 94L, 
                        95L, 96L, 99L), y = c(-0.836247895854167, -0.281515072256679, 
                                              -1.2041565129159, -0.341733562626997, -0.22443054331351, -0.278243927230703, 
                                              -1.11263119068763, -0.768059438398383, -1.17445897461462, -0.0361278849478118, 
                                              0.411890405056926, -1.1485944259541, -0.336876178849371, 0.12681057713124, 
                                              -0.822892022065589, 0.164746929314433, -0.783690505991563, 0.268570875657203, 
                                              0.138437680330203, -1.28643782070759, -0.467179444479306, 0.195410974540062, 
                                              0.750195218719275, 1.0891030421355, 1.53362699396825, 2.32472169344723, 
                                              2.46537793967258, 2.29575330229392, 3.59285078678843, 3.4609489157655, 
                                              2.81818188382646, 3.60825335152284, 3.8231490511793, 4.98036157847448, 
                                              5.53682320406975, 5.53604703286923, 5.99850440998717, 6.47301278466759, 
                                              7.32191166231776, 7.21508917164472, 7.08528732127764, 8.40882212751775, 
                                              8.18201440766599, 8.39834613047969, 9.70178451135652, 10.3869938287989, 
                                              10.5494304469316, 11.529563715218, 10.8894989335741, 12.2573114191883
                        )), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
                        ))

test <- cpt.meanvar(dat$y,penalty="Manual",pen.value=(2*log(length(dat$y))),method="BinSeg",Q=10,test.stat="Normal",class=TRUE,
            param.estimates=TRUE)

test

returns:

Changepoint Locations : 22 24 28 33 38 41 44

while

test@cpts

returns:

[1] 22 24 28 33 38 41 44 50
rkillick commented 3 years ago

@MichelineCampbell thanks for the question. This is an intended function of the package. For most users of the package, they don't want the final time point returned as a changepoint and so the universal methods to access the slots removes this. However, as a developer there are instances where having the length of the data included is helpful (e.g. in loops and estimating parameters) so this is returned in the cpts slot. If you want a vector without the length of the data included, the correct function call is cpts(test) which uses the universal access to slots (regardless of S3/S4/R6 class structure).

Thanks for your question! If you have any more, please post a new issue.

MichelineCampbell commented 3 years ago

Ahhhhh, I see! Thank-you so much :)