In the era of precision medicine and cancer genomics, data are being generated so quickly that it is difficult to fully appreciate the extent of what is discoverable. DNA methylation, a chemical modification to DNA, has been shown to be a significant factor in many cancers and is a candidate data source with ample features for model training. However, the black-box nature of non-linear models, such as those in deep learning, and a lack of accurately labeled ground truth data have limited the same rapid adoption in this space that other methods have experienced. In this article, we discuss the applications of unsupervised learning through the use of variational autoencoders using DNA methylation data and motivate further work with initial results using breast cancer data provided by The Cancer Genome Atlas. We show that a logistic regression classifier trained on the learned latent methylome accurately classifies disease subtype.
https://doi.org/10.5220/0006636401400145