Artificial neural networks in the cancer genomics frontier

very simplistic review discussing some reasonably outdated data processing guidelines for microarray data. Also lightly discusses artificial neural network basics.

It also discusses a four examples of NNs applied to cancer data. I don't believe any of these examples are current issues but they are mostly supervised learning tasks. Three of them are questionable in terms of the amount of data provided to the MLP and appear to be either very easy classification tasks or severely overfit. The last one is an unsupervised feature engineering example using pan-cancer data.

Khan et al. 2001 - classify small, round blue cell tumors (SRBCTs) into four distinct childhood tumor classes (neuroblastoma, medulloblastoma, rhabdomyosarcoma, and non-Hodgkin lymphoma). Training set is 63 samples, validation is 25. Build a two layer MLP and achieve 100% accuracy using 96 genes.
Pal et al. 2007 - using the same SRBCT dataset and only 7 genes build an MLP architecture with 1 hidden layer of 150 nodes and fuzzy clustering to achieve 100% accuracy on holdout set.
Chang et al. 2011 - use MLP to identify a set of 33 miRNA important for distinguishing colorectal cancer from normal tissue
Fakoor et al. 2013 - use unsupervised learning (autoencoder) for feature construction. Then apply their constructed features to various classification tasks pan-cancer (eg. ER status in BRCA). The input to the AE are principal component scores and select raw gene expression values. Which is an interesting design decision that probably does not lead to great classifier interpretability.

greenelab / deep-review

Artificial neural networks in the cancer genomics frontier #70