agitter commented 8 years ago

gwaybio commented 8 years ago

A deep learning model that uses stacked denoising autoencoders (sDA) to pretrain weights and optimize architecture and a multilayer perceptron (MLP) to predict whether a given CpG is hypo- or hyper-methylated.

Biology

Predicting the methylation status of a CpG loci
- Using cell lines GM12878 and K562
- Converted to binary classification task
- CpG beta values are distributed mostly to 0 or 1
- cutoff defined to be 0.01 to ensure balanced classes
- Gold standard reduced representation bisulfite sequencing
Uses sequence features
- 500 - 1000 nucleotide windows describing ratio and "order"
Topological features
- Look to Hi-C contacts and sequence features around Hi-C contacts

Computation

Unsupervised Pretraining
- sDA with two hidden layers with 500 nodes each
- sDA had an additional sigmoid output layer that predicted y_i
- y_i is either -1 or 1 depending on argmax_iP(Y=i|x, W, b)
Supervised Training
- Weights and hyperparameters optimally defined by sDA carried over to initialize MLP
- Same architecture
- Trained to predict the output of the sDA (y_i)

Why should we include it in our review

I am concerned about several aspects of the study. First, the engineered features could probably be designed more carefully and second, the MLP is trained to predict the output of the first algorithm. The latter leaves me wondering if the MLP is keying in on something biasing the sDA. I was also confused about exactly what the features were and how performance was evaluated.

While I think the paper fell short in these ways it could be discussed as part of learning epigenomic features and integrating 3-dimensional genomic features. I would also say it could be talked about when discussing using unsupervised algorithms for automatic feature construction, but this was not really done here.

agitter commented 7 years ago

@gwaygenomics I am debating whether we need a separate section on methylation in the Study section. Based on your comments here, I'm thinking we do not. Do you agree?

gwaybio commented 7 years ago

@agitter yes, that sounds reasonable. Perhaps including methylation in the "related epigenomic tasks" section is sufficient.

I also think in a lot of ways DNA methylation data is expanding faster than any other genomic platform. See Illumina's new EPIC Beadchip and other newer technologies including hydroxymethylcytosine analyses. Could be a lot of room for deep learning applications! :smile_cat:

greenelab / deep-review

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks #68

Biology

Computation

Why should we include it in our review