hildensia / bayesian_changepoint_detection

Methods to get the probability of a changepoint in a time series.
MIT License
670 stars 213 forks source link

How to adjust the sensitivity of the BOCD algorithm? #31

Open gqffqggqf opened 3 years ago

gqffqggqf commented 3 years ago

There is always a tradeoff between false alarms and missed alarms, and when the algorithm is more sensitive we should have higher false alarm rate and lower missed alarm rate. My question is, is it possible to adjust the sensitivity level of this algorithm by changing the hyperparameter (e.g., alpha, beta, kappa, mu)? Thank you!

hildensia commented 3 years ago

Hi,

The algorithm outputs the true Bayesian probabilities. How you act on it is up to you (e.g. setting probability threshold that are more or less sensitive).

And you can obviously choose priors, in good Bayesian tradition, that fit your sense of how likely an "alarm' is. This is the prior_func in the implementation. E.g. a const_prior says on every step a change point is equally likely, etc.

gqffqggqf commented 3 years ago

Hi, thanks for the quick reply! This makes sense.

A following-up question is, which one would you think should be more sensitive, the one with univariate t or multivariate t distribution (consider any alarm on a single channel in the univariate case as an alarm)? I have applied both on the same dataset with the same way to trigger the alarm (to rule out the influence of sensitivity thresholds as you mentioned), and the univariate version turns out to be more sensitive.

My intuition was the multivariate version should be more sensitive, since it considers the correlation between the channels. For example, if a weak change point triggers a slight increase on all channels, this signal should be more obvious when observing from a higher-dimensional space (multivariate) compared to observing from each dimension separately.

Do you have any thoughts regarding this, i.e., which one should be more sensitive between the univariate and multivariate version? Thanks.

hildensia commented 3 years ago

I don't see how you can run univariate and multivariate models on the same data? How did you combine the multiple dimensions to receive a single random variable? Or did you run it on each dimension? But then how did you combine the outcomes? max? Multiplied them? 1 - prod_i(1 - p_i) (i.e., the probably that at least one random variable is at a change point -- I think this is the correct thing to do)?

I haven't really played with the multivariate case myself, so I have limited information and intuition there. Generally, my intuition would be that higher dimensions lead to lower likelihoods, just because you multiply more often, but I could be wrong here.

haoyu112 commented 2 years ago

Hi,

The algorithm outputs the true Bayesian probabilities. How you act on it is up to you (e.g. setting probability threshold that are more or less sensitive).

And you can obviously choose priors, in good Bayesian tradition, that fit your sense of how likely an "alarm' is. This is the prior_func in the implementation. E.g. a const_prior says on every step a change point is equally likely, etc.

Can you provide some more detailed example about tunning the sensitivety of the BOCD. I have a sample dataset that will only drop 0.14 in dataset that have mean value 114, a realy small ratio compare to the example. Thanks in advance. image