Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization
To implement the algorithm, we need to know the definition of this loss function: $$L(x, t_1, t2) = \min{\mu} \sum_{i=t_1}^{t_2} (xi - \mu)^2 = \sum{i=t_1}^{t_2}(xi - \frac{\sum{j=t_1}^{t_2}x_i}{t_2-t1+1})^2=\sum{i=t_1}^{t_2}xi^2-\frac{(\sum{i=t_1}^{t_2}x_t)^2}{t_2-t_1+1}$$ Pseudo code (last changepoint algorithm):
Similar to OPART, there is only one difference: inside the for loop, instead of consider $\tau \in {0, 1, \dots, t-1}$, we consider $\tau \in T$ which satisfy the number of changepoints from all of the train labels.
To choose the best value of $\lambda$ (apply for either OPART or LOPART), we use the train set to learn the best $\lambda$ then apply that $\lambda$ to the test set.
Set $\lambda = \log(N)$ where $N$ is the length of the sequence. For example, $N = 100$, then $\lambda \approx 4.6$.
Similar to CART, MMIT using decision tree to predict the value of $\lambda$ using sequences features.
0.data_process
: code to process raw data from epigenomic dataset (do not include the dataset).0.lopart_code
: code to process raw data from detailed and systematic dataset, and OPART/LOPART algorithm.1.linear
: R code to run linear_unreg and linear_L1reg algorithms.1.MMIT
: R code to run MMIT algorithm.acc_rate_csvs
: Contains CSV files detailing the accuracy rates for each implemented method.figures
: Holds figures generated and code to generate them.paper_figures
: Holds other figures used for the paper.training_data
: Consists of data for training pertaining to error counts for each lambda, sequence features, and target intervals.BIC.ipynb
: Implements the computation of log_lambda using the Bayesian Information Criterion (BIC) approach.get_table_chosen_mlp.ipynb
: get table of chosen MLP configuration.get_acc_from_R_predictions.ipynb
: update acc_rate_csvs from R predictions (linear or mmit).linear.ipynb
: Implements learning log_lambda from a set of sequence features using linear approach.MLP.ipynb
, MLP_117.ipynb
: Implements learning log_lambda from a set of sequence features using a Multi-Layer Perceptron (MLP) approach.MLP_cv.ipynb
, MLP_117_cv.ipynb
: Cross validation to write a csv file about configuration and validation accuracy.MMIT.ipynb
: get accuracy from predicted log lamda of MMIT.utility_functions.py
: Collection of utility functions.BIC.ipynb
, linear.ipynb
, MLP.ipynb
, MLP_117.ipynb
, 1.MMIT/MMIT.ipynb
, 1.linear/linear_L1reg/linear_l1reg.ipynb
for each dataset (set dataset to run in the beginning of notebook file), to generate a CSV file containing accuracy rates for each method.MMIT.ipynb
and get_acc_from_R_predictions.ipynb
to update accuracies from linear_l1 and MMIT.Execute figures/0.get_plot_acc.ipynb
, 1.get_plot_mlp.ipynb
, 2.get_plot_features_targets.ipynb
, figure_features_target.ipynb
. The resulting figure will be generated in the figures
folder.
Unless otherwise stated, all content in this repository is licensed under the Creative Commons Attribution 4.0 International License. You are free to:
Under the following terms:
For any questions regarding licensing or the use of this repository, please contact Tung L Nguyen.