@tdhock This paper on changepoint detection using deep learning is quite popular and relatively recent. The reviewer from NeurIPS also referenced this paper and suggested that I incorporate this method into my study. I’ve summarized it and have some thoughts on it. If you have time, please take a look and give me some comments.

Problem Setup

Given:

A sequence of length $$n$$: $$[x_1, x_2, ..., x_n]$$.
A classifier $$\phi$$ that takes as input a segment of length $$m$$ from the sequence. It outputs $$\phi(xi, ..., x{i+m-1}) \rightarrow$$ {0, 1}, where $$0$$ indicates "no change" and $$1$$ indicates "one change."

It's their novel contribution: a classifier designed to determine whether a segment contains a single changepoint or none. The classifier’s architecture is a multi-layer perceptron (input size is m, output size is binary) using ReLU activation functions.
A threshold parameter $$\gamma$$, where $$0 < \gamma < 1$$.

Algorithm:

For each window $$[xi, ..., x{i+m-1}]$$, with $$i$$ ranging from $$1$$ to $$n - m + 1$$, calculate $$L_i = \phi(xi, ..., x{i+m-1})$$
For each potential changepoint index $$j$$, ranging from $$m$$ to $$n - m + 1$$, calculate the moving average: $$\hat Lj = \frac{1}{m} * \sum{k = j - m + 1}^{j} L_j$$
Identify maximal length segments $$[s_k, e_k]$$ where all values in $$\hat L_j \geq \gamma$$ for $$j$$ in $$[s_k, ek]$$. For each such segment, define a changepoint at $$\arg\max{j}(\hat L_j)$$ for $$j$$ in $$[s_k, e_k]$$.

Example

Given:

Sequence $$x = [1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1] \in R^{13}$$ (hope the algorithm will detect changepoints at location 4 and 8)
Trained classifier $$\phi(z_1, z_2, z_3) \rightarrow$$ {0, 1} (window size $$m = 3$$).
Threshold $$\gamma = 0.5$$.

Algorithm Steps:

For each window of size 3 (i.e., $$(xi, x{i+1}, x_{i+2})$$ where $$i$$ ranges from 1 to 11), compute $$\phi(xi, x{i+1}, x_{i+2})$$. This results in:

$$L = [L_1, L_2, L_3, L_4, L_5, L_6, L_7, L_8, L9, L{10}, L_{11}] = [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0]$$

For each possible changepoint $$j$$ from 3 to 11, compute the average $$\hat L_j$$ as:

$$\hat L = [\hat L_3, \hat L_4, \hat L_5, \hat L_6, \hat L_7, \hat L_8, \hat L9, \hat L{10}, \hat L_{11}] = [0.33, 0.67, 0.67, 0.33, 0.33, 0.67, 0.67, 0.33, 0]$$

Identify segments $$[4, 5]$$ and $$[8, 9]$$ where $$\hat L_j \geq \gamma$$, suggesting detected changepoints at indices 4 and 8.

Future Plan

Limitations:
1. When they train the classifier, the input size (or segment size, sliding window size) is fixed. For instance, the dataset used in the LOPART paper includes labeled segments with both negative labels (indicating no changepoint) and positive labels (indicating one changepoint), but these labeled segments vary in length. Therefore, utilizing an MLP classifier for this supervision is not suitable.
2. Standardizing the input segment by scaling it to [0,1] lacks flexibility for different sequences. For instance, a small change in the mean may indicate a change point in one sequence but not in another. The decision needs to consider the entire sequence context, not just a small segment.
3. Technically, when left_mean and right_mean are different, cases where left_mean > right_mean and right_mean > left_mean represent same inputs. But the model treats them differently.
Proposed Solution 1:
1. Supervision: Instead of using an MLP to train the classifier (which is not feasible), we will train classification RNN-based models (RNN, LSTM, or GRU) for each labeled segment (input: labeled segment -- output: 0 for negative label, 1 for positive label).
2. Prediction: Once the classifier is trained, we will select a window size (this part is tricky) and apply the same method as they did.
Proposed Solution 2: We can keep using a fixed-size input model for classification, but instead of determining whether a segment contains one changepoint or none, we will treat the input as two consecutive non-overlapping segments of equal length $$[i-m:i]$$ and $$[i:i+m]$$ to classify whether $$i$$ is a changepoint.
1. Supervision:
  - For negative labels: If location $$i$$ falls within the labeled segment, the model inputs will be the 2 segments $$[i-m: i]$$ and $$[i: i+m]$$, with the model output of 0.
  - For positive labels: Since the exact changepoint location $$i$$ is unknown, one approach is to select $$i$$ as the solution from the binary segmentation with 2 segments of the labeled segment. Once we identify location $$i$$, the segments $$[i-m:i]$$ and $$[i:i+m]$$ will be the model inputs with the model output of 1. (Note: This approach may lead to a higher count of class 0 instances compared to class 1.)
2. Training: Possible models to explore include MLP (by concatenating the two inputs into a single long input), Siamese Networks (which will take the two segments as inputs, compare them, and indicate their similarity), or CNN (treating the two segments of size $$m$$ as an image of size $$2 \times m$$).
3. Prediction: We will slide the window $$[i-m:i]$$ and $$[i:i+m]$$, feeding the two segments into the trained classifier to determine whether location $$i$$ is a changepoint.

lamtung16 / ML_ChangepointDetection

Summary of paper: https://academic.oup.com/jrsssb/article/86/2/273/7517020 #11

Problem Setup

Given:

Algorithm:

Example

Given:

Algorithm Steps:

Future Plan