lindeloev / mcp

Regression with Multiple Change Points
http://lindeloev.github.io/mcp
106 stars 19 forks source link

Readme #42

Closed behinger closed 5 years ago

behinger commented 5 years ago

Hey! Great package! Initially I had trouble to install because some of my packages were outdated. But after updating everything it ran pretty smooth.

Pretty good job! worked fine for me so far. I only ran it on simulated data, I have to check for real data :-) My problem with real data is that I usually have strong autocorrelation, i.e. changes are not really discrete hinges, but smoothed over time. I guess one could fit plateaus & slopes, but still no smoothness in the fit.

lindeloev commented 5 years ago

Thank you for the feedback, @behinger! I am working on the docs just now and your comments really help to figure out what to bring to the fore. I should definitely do a vignette introducing the "segments list" and what happens behind the scenes (see #35). A website is in the works and the "articles" menu may contain a few of the things you seek: https://lindeloev.github.io/mcp. I'll let you know when I've written it up.

Change point at "empty" times

Yes, change points can occur at "times" with no data because they are modeled as continuous on par_x. This means that if you have e.g. x = 0, 1, 2, 10, 11, 12 and the associated y = 0, 0, 0, 1, 2, 3 and you model it as a plateau followed by a joined slope (y ~ 1, 1 ~ x), the optimal guess would be that the change happened at x = 9, even though that is not observed. Do you think that this is desirable or whether there is a use case for assigning probability-of-change-point only to observed x-values? TBH, this was just the easiest solution to implement.

Identity between plateaus

For return-to-previous-plateau, simply do prior = list(int_3 = "int_1"). That is, you have a 100% prior belief that they are identical. If you want it to just be in the vicinity, do prior = list(int_3 = "dnorm(int_1, 0.001); i.e. heavy shrinkage towards the mean of int_1. In both cases, int_1 will be just as much affected about what happens in segment 3 as the other way around. Is this something like what you asked?

Autocorrelation

Yes, change point analysis is a big thing for (autocorrelated) time-series such as stocks, the stability of critical systems, etc. This is certainly possible to implement, though I have to get past a forest of low-hanging fruit before I get there ;-)

Thanks again!

lindeloev commented 5 years ago

@behinger Do you recall which packages you had to update? I should update the DESCRIPTION to require up-to-date packages.

If in Rstudio, you can type install.packages in the console, press CTRL + UP, and see your recent command history.

behinger commented 5 years ago
lindeloev commented 5 years ago
behinger commented 5 years ago

Sure! An example: grafik Here you can see that the changepoints are discrete. Maybe to change my question: What determines the distance the changepoints are separated from each other? (i.e. the distance between the vertical lines).

Smooth changepoint: grafik

This timeseries has no simulated autocorrelation.The changepoint is a sigmoid with 4 samples width or so. I used the sigmoid because I can differentiate it and thus estimate it using STAN/NUTS

lindeloev commented 5 years ago

Great catch, that is indeed confusing! This is just plot.mcpfit defaulting to evaluating 100 positions along x for computational reasons. Increasing to 1000 goes from this:

image

to this:

image

However, it comes at a computational cost (it's slow) and I don't like putting in an extra argument to plot. I will try and make a solution where it selectively increases the resolution around change points.

lindeloev commented 5 years ago

OK, all documentation has now been updated. Based on your comments, I added an article about the formula syntax: https://lindeloev.github.io/mcp/articles/formulas.html.

Updated README, now that a lot has been separated out into vignettes/articles: https://github.com/lindeloev/mcp Updated site (frontpage is just the README): https://lindeloev.github.io/mcp/

I also increased the general resolution of plot(fit) four-fold as a temporary fix. And added some demo datasets, so people can get up and running quicker.

Getting close to release of 0.1!

behinger commented 5 years ago

Wow very nice!

I found this syntax a bit confusing:

Segments:
   response ~ 1 
   response ~ 1 ~ 0 + time 
   response ~ 1 ~ 1 + time 

simply because it differs from the list you put in.

But besides this its a very cool package!! Congratulations and thanks a lot.

lindeloev commented 5 years ago

Thanks! OK, yes. If others raise this as well I would not oppose changing it since you could always derive one representation from the other. It is to enable multivariate change points and variance-change change points in the future (https://github.com/lindeloev/mcp/issues/23) and many other unforeseen things.

behinger commented 5 years ago

cool! Now I am looking forward to get some data with a changepoint ;-)