zhaokg / Rbeast

Bayesian Change-Point Detection and Time Series Decomposition
236 stars 38 forks source link

Credible Interval for changepoints #17

Open mdelasheras opened 1 year ago

mdelasheras commented 1 year ago

Hi Zhaokg,

I am working with just one timeseries of 1 period and I am detecting correctly the changepoints and its change in trend. However, I would like to have some deeper insights and tools to see if actually a probability of (let's say 0.6) in a change point, is enough to consider it a valid changepoint.

I've been searching in all of the articles you shared but I cannot find an example on how to build a credible interval 95% in those cases. Do you have any example or some thought on how I could do it? Or how can I access to the simulations to build a CI?

Thank you so much, you package is amazing!

zhaokg commented 1 year ago

Hi mdelasheras, Thanks for giving the package a try. Not sure what do you mean by a a time series of 1 period? Does the 1 period mean 1 year or 1 data point? Regardless, Each detected changeppiont (i.e., o$trend$cp) is associated with a probability and 95% CI, as given by o$trend$cpPr, and o$trend$cpPI, respectively. Are these what you are looking for? Sorry about the possible confusion. If these are not what you want, please let me know.

mdelasheras commented 1 year ago

Thank you so much for your reply. I was referring to a time series of 80 years, 1 data point at each year describing your lung capacity. Our goal is to detect the number of changepoints in this serie and identify if there is (and when it is) a decay or if there's a plateau. I have seen we can have the likelihood of having ncp. So when we select the most probable model, we have some cp whose probability of changepoint occurrence is, let's say 0.4. I couldn't find any reference in the papers attached with an example saying for instance: we take as true positives changepoints with more cpPr and its credible interval > 0.5.

This connects also with the concept of the probability of a trend being positive, zero, or negative. The sum of the three probabilities has to add up to one so we have a "dominant" one, but would you conclude on defining the trend by selecting the one with the biggest prob or do you know if there's some recommendable threshold?

I know these are kind of questions of interpretation but in order to have deeper conclusions in our paper I wanted to dig in having some ideas on those thresholds and see if it's something arbitrary or if there are some criteria behind them.

Thank you so much Zhao again for your work

linlin0026 commented 8 months ago

Hello, I've encountered a similar issue. For a time series, what threshold value of changepoint's cpPr (probability) do you consider it as a reliable changepoint? I would like to ask how you have dealt with this issue. Thank you.

zhaokg commented 8 months ago

Hello, I've encountered a similar issue. For a time series, what threshold value of changepoint's cpPr (probability) do you consider it as a reliable changepoint? I would like to ask how you have dealt with this issue. Thank you.

Dear linlin0026,

Happy 2024. Thanks for asking and giving BEAST a try, together with an apology for not getting back to you earlier. (It is the crazy start of the semester here, and I just got overwhelmed with too many things on my plate).

You are asking an excellent question about the cutoff probability. In my biased opinion, I don’t think there is any agreed standard for Bayesian inference. (Unlike frequentist inference, there is an common standard like p=0.05 – although the use of such a standard has been recently questioned and there is a urge for abandoning p-value [https://www.bmj.com/content/364/bmj.l1374] to a degree that some journals forbid the use of p-value.) With that said, to me, it is OK to use whatever cutoff probability you deem appropriate for your applications at and. Your choices of 0.5 is definitely a reasonable start.

As a side note, if you are not sure how many changepoinits are in the time series, my first stop will be using the o$trend$ncp or o$trendncp_median( the is the average/median number of changepoints) as a reasonable estimate. These ncp’s are derived from o$trend$ncpPr. Say, if ncp=3, I will just take the first three changepoints in o$trend$cp; that way, you can avoid choosing a specific cutoff probability.

Alternatively, if you know there are just a give number of changeponts but are not sure where they are, you cans specify the tcp.minmax argument. For example. If I know there are two chnagepoints in the Nile river time series, in R, I can run o=beast(Nile, tcp.minmax=c(2,2)); there will be exactly two changepoints. If running ‘bar(o$trend$ncpPr)’, you can see the probability of having 2 changepoints are 1.0 (this is just our prior assumption).

Next, you can do something like this to determine what is a good prior assumption of the number of changepoints:

Suppose there is exactly 1 changepooint, run o=beast(Nile, tcp.minmax=c(1,1)) and check the value o$marg_lik Suppose there are exactly 2 changepooints, run o=beast(Nile, tcp.minmax=c(2,2)) and check the value o$marg_lik Suppose there are exactly 3 changepooints, run o=beast(Nile, tcp.minmax=c(3,3)) and check the value o$marg_lik ….

The larger the marginal likelihood, the better the fitting. So, the one with the largest marg_lik should be the best, from which you can determine what is the “optimal” number of changepoints.

Hopefully, this makes sense. If not, let me know.

Kaiguang