mongodb / signal-processing-algorithms

Apache License 2.0
39 stars 16 forks source link

Add E-Divisive Mean algorithm #22

Open sfc-gh-mfruth opened 1 year ago

sfc-gh-mfruth commented 1 year ago

This patch adds the E-Divisive Mean algorithm that was defined in this paper:

Matteson, D.S. and James, N.A., 2014. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), pp.334-345.

Additionally, the attribute min_cluster_size is added to the e_divisive call. This attribute allows to define the minimum amount of data points for each cluster. This attribute is also used in the reference implementation of the E-Divisive Mean algorithm: https://github.com/cran/ecp/blob/56d4d0ede1b361fd23010b75e5b0c5438de6cf7d/R/e_divisive.R#L42

To calculate a single change point, the current implementation has a runtime of O(T) (T = length of time series). The E-Divisive Mean algorithm has a runtime of O(T^2).

The following example should illustrate for which groups the Q-value is calculated:

time_series = [1, 2, 3, 4, 5, 6]
# X = 'left' cluster; Y = 'right' cluster
# Current implementation:
X = []
Y = [1, 2, 3, 4, 5, 6]

X = [1]
Y = [2, 3, 4, 5, 6]

X = [1, 2]
Y = [3, 4, 5, 6]

# ...

X = [1, 2, 3, 4, 5]
Y = [6]

####################################

# New/patched algorithm (E-Divisive Mean)
X = [1, 2]
Y = [3, 4]

X = [1, 2]
Y = [3, 4, 5]

X = [1, 2]
Y = [3, 4, 5, 6]

X = [1, 2, 3]
Y = [4, 5]

X = [1, 2, 3]
Y = [4, 5, 6]

X = [1, 2, 3, 4]
Y = [5, 6]
michaelfruth commented 11 months ago

I would be happy to revise the PR for a merge, e.g. add tests or outsource the new method, etc. Are there any changes you want?