Open lamtung16 opened 3 weeks ago
@tdhock I've been considering the proposed methods, and I don't think Proposed Solution 2 is a good idea, as it is likely to detect two consecutive changepoints.
However, for Proposed Solution 1, in step 2 (prediction), I can select the window size based on sequence length (because the labels length are related to sequence length, see https://github.com/lamtung16/ML_Changepoint_Detection_Auto/blob/382ff2ceea26b0f442e36e1a90c906e71d5b5ca1/data/cancer/figures/analyze/length_vs_label_length.png). After apply the classifier for a sliding fixed-size window for every point in the sequence, we will have labels that cover the entire sequence, with no overlap between labels, and the union of the labeled regions will be equal to the entire sequence. Finally, apply LOPART with penalty $$\lambda=0$$ to detect changepoint locations.
@tdhock This paper on changepoint detection using deep learning is quite popular and relatively recent. The reviewer from NeurIPS also referenced this paper and suggested that I incorporate this method into my study. I’ve summarized it and have some thoughts on it. If you have time, please take a look and give me some comments.
Problem Setup
Given:
Algorithm:
Example
Given:
Algorithm Steps:
$$L = [L_1, L_2, L_3, L_4, L_5, L_6, L_7, L_8, L9, L{10}, L_{11}] = [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0]$$
$$\hat L = [\hat L_3, \hat L_4, \hat L_5, \hat L_6, \hat L_7, \hat L_8, \hat L9, \hat L{10}, \hat L_{11}] = [0.33, 0.67, 0.67, 0.33, 0.33, 0.67, 0.67, 0.33, 0]$$
Future Plan
Limitations:
left_mean
andright_mean
are different, cases whereleft_mean > right_mean
andright_mean > left_mean
represent same inputs. But the model treats them differently.Proposed Solution 1:
Proposed Solution 2: We can keep using a fixed-size input model for classification, but instead of determining whether a segment contains one changepoint or none, we will treat the input as two consecutive non-overlapping segments of equal length $$[i-m:i]$$ and $$[i:i+m]$$ to classify whether $$i$$ is a changepoint.