behinger commented 5 years ago

Hey! Great package! Initially I had trouble to install because some of my packages were outdated. But after updating everything it ran pretty smooth.

It took me a bit to understand the logic of the list. I think a simple comment in the quick-start would fix this (e.g. "between each entry (a "segment") of the list, a changepoint is modelled"). After understanding this, the toolbox was intuitive to use
readme: sampling the prior: empty = mcp(segments, sample=FALSE) Here it is implicitly assumed that segments defines "x" somehow.
I think I'm a bit confused what the underlying model is. I get discrete changepoints but at points where there are no samples. This is still confusing me tbh.
the rel(1) command lets you parameterize the parameter relative to the last segment. Is it also possible to parameterize to any other ones? I am thinking of a situation where two changepoints define a plateau that is different and then going back to the initial value. i.e. In this example, I might want to assume that the first and last segment / plateau have identical parameters (or at least put a prior that the difference is quite small)

Pretty good job! worked fine for me so far. I only ran it on simulated data, I have to check for real data :-) My problem with real data is that I usually have strong autocorrelation, i.e. changes are not really discrete hinges, but smoothed over time. I guess one could fit plateaus & slopes, but still no smoothness in the fit.

lindeloev commented 5 years ago

Thank you for the feedback, @behinger! I am working on the docs just now and your comments really help to figure out what to bring to the fore. I should definitely do a vignette introducing the "segments list" and what happens behind the scenes (see #35). A website is in the works and the "articles" menu may contain a few of the things you seek: https://lindeloev.github.io/mcp. I'll let you know when I've written it up.

Change point at "empty" times

Yes, change points can occur at "times" with no data because they are modeled as continuous on par_x. This means that if you have e.g. x = 0, 1, 2, 10, 11, 12 and the associated y = 0, 0, 0, 1, 2, 3 and you model it as a plateau followed by a joined slope (y ~ 1, 1 ~ x), the optimal guess would be that the change happened at x = 9, even though that is not observed. Do you think that this is desirable or whether there is a use case for assigning probability-of-change-point only to observed x-values? TBH, this was just the easiest solution to implement.

Identity between plateaus

For return-to-previous-plateau, simply do prior = list(int_3 = "int_1"). That is, you have a 100% prior belief that they are identical. If you want it to just be in the vicinity, do prior = list(int_3 = "dnorm(int_1, 0.001); i.e. heavy shrinkage towards the mean of int_1. In both cases, int_1 will be just as much affected about what happens in segment 3 as the other way around. Is this something like what you asked?

Autocorrelation

Yes, change point analysis is a big thing for (autocorrelated) time-series such as stocks, the stability of critical systems, etc. This is certainly possible to implement, though I have to get past a forest of low-hanging fruit before I get there ;-)

Thanks again!

lindeloev commented 5 years ago

@behinger Do you recall which packages you had to update? I should update the DESCRIPTION to require up-to-date packages.

If in Rstudio, you can type install.packages in the console, press CTRL + UP, and see your recent command history.

behinger commented 5 years ago

package: Unfortunately not, I just told r-studio to update everything. Only after the fact I thought it would have been super useful :|
autocorrelation: makes sense! Its not only autocorrelation though, could also be smooth changepoints
changepoint empty lines: I understood that its a continuous variable, I think I am thrown off that the actual posterior samples seem discrete, not continuous. Why would that be the case?
identity between plateaus: this is clever and super cool. Exactly what I thought (and the specifying a prior as a "sample" from another parameter is cool)

lindeloev commented 5 years ago

Re package versions: No problem, I might just add the current CRAN version of all packages as dependencies, even if some older ones may work. Thanks for catching this.
Could you say a bit more about how the change points appear discrete? I'm sure many others will have the same thoughts, so it would be good to address it in advance. A few comments:
- They are modelled as discrete, but there is uncertainty about where this discrete change takes place.
- Sometimes, a single "revealing" change point is enough to indicate that a change happened just there. There can be multiple such "revealing" change points, making the posterior of the change point bimodal (actually N-modal). This is as it should be.
  - When you simulate, try adding a lot of uncertainty (high sigma). This should increase the width of the posterior.
I have a hard time wrapping my head around what a "smooth change point" would be. Could you say more? Or does it pertain to the above?
I must admit that the use of priors to do these things feel like a divine revelation :-)

behinger commented 5 years ago

Sure! An example: grafik Here you can see that the changepoints are discrete. Maybe to change my question: What determines the distance the changepoints are separated from each other? (i.e. the distance between the vertical lines).

Smooth changepoint: grafik

This timeseries has no simulated autocorrelation.The changepoint is a sigmoid with 4 samples width or so. I used the sigmoid because I can differentiate it and thus estimate it using STAN/NUTS

lindeloev commented 5 years ago

Great catch, that is indeed confusing! This is just plot.mcpfit defaulting to evaluating 100 positions along x for computational reasons. Increasing to 1000 goes from this:

to this:

However, it comes at a computational cost (it's slow) and I don't like putting in an extra argument to plot. I will try and make a solution where it selectively increases the resolution around change points.

lindeloev commented 5 years ago

OK, all documentation has now been updated. Based on your comments, I added an article about the formula syntax: https://lindeloev.github.io/mcp/articles/formulas.html.

Updated README, now that a lot has been separated out into vignettes/articles: https://github.com/lindeloev/mcp Updated site (frontpage is just the README): https://lindeloev.github.io/mcp/

I also increased the general resolution of plot(fit) four-fold as a temporary fix. And added some demo datasets, so people can get up and running quicker.

Getting close to release of 0.1!

behinger commented 5 years ago

Wow very nice!

I found this syntax a bit confusing:

Segments:
   response ~ 1 
   response ~ 1 ~ 0 + time 
   response ~ 1 ~ 1 + time

simply because it differs from the list you put in.

But besides this its a very cool package!! Congratulations and thanks a lot.

lindeloev commented 5 years ago

Thanks! OK, yes. If others raise this as well I would not oppose changing it since you could always derive one representation from the other. It is to enable multivariate change points and variance-change change points in the future (https://github.com/lindeloev/mcp/issues/23) and many other unforeseen things.

behinger commented 5 years ago

cool! Now I am looking forward to get some data with a changepoint ;-)

lindeloev / mcp

Readme #42

Change point at "empty" times

Identity between plateaus

Autocorrelation