Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning

https://doi.org/10.1038/s41591-018-0177-5

Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and subtype of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be predicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH.

Summary

The authors train an inception v3 CNN on TCGA lung cancer pathology images. There are two goals presented in the paper 1) CNN to predict normal lung vs. lung adeno (LUAD) vs. lung squamous (LUSC); 2) CNN to predict mutation status of top 10 mutated genes in LUAD. The model is also nicely validated in an independent set and in FFPE and fresh frozen samples.

There are some nice discussions about other CNNs applied to pathology images, and other instances of the inception network applied to biomedical images (#207, #151)

Overall, the performance for the first task is quite good, reaching pathologist level prediction strength. Interestingly, both model and pathologist incorrectly classified many of the same samples. The second task is very interesting - the image data themselves store information about the mutation status of the given pixel patch. The mutations had variable classification performance. For example, STK11 mutations were predicted strongly, while ALK mutations could not be detected.

Computational Aspects

Data segmentation into 512 x 512 image patches
- Tens of thousands of patches per slide
- Each slide is predicted normal, LUAD, LUSC
- Final prediction on sample is majority vote
- No data augmentation strategy
Fine tuned Inception v3 network
- An untrained network achieved AUC = 0.847 (LUAD vs. LUSC)
- Trained AUC = 0.95
Also compared results to a multi-task inception v3 (predicting all three at once) (AUC improved to 0.968)
Trained a new inception model to predict mutation status
- Prediction probability correlated with allele frequency in many cases - indicating potential clonality effect
Would have been interesting to observe or at least speculate on what the intermediate layers were actually learning in both models.

Biological Aspects

Interesting application to predict mutation status from pixels. It also looks like many predictions in the same sample (i.e. different patches) were associated with different mutational states and lung cancer subtypes. This approach could be one nice way of assessing mutational heterogeneity (with spatial resolution). It would also be interesting if the samples consistently predicted incorrectly by pathologists and the model were more heterogeneous - many patches predicted as LUAD, LUSC, and higher mutational heterogeneity.

greenelab / deep-review