The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic drug susceptibility but gaps remain for predicting phenotype accurately from genotypic data. Using targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3,601 Mycobacterium tuberculosis strains, 1,228 of which were multidrug resistant, we implemented the first multitask deep learning framework to predict phenotypic drug resistance to 10 anti-tubercular drugs. The proposed wide and deep neural network (WDNN) achieved improved predictive performance compared to regularized logistic regression and random forest: the average sensitivities and specificities, respectively, were 92.7% and 92.7% for first-line drugs and 82.0% and 92.8% for second-line drugs during cross-validation. On an independent validation set, the multitask WDNN showed significant performance gains over baseline models, with average sensitivities and specificities, respectively, of 84.5% and 93.6% for first-line drugs and 64.0% and 95.7% for second-line drugs. In addition to being able to learn from samples that have only been partially phenotyped, our proposed multitask architecture shares information across different anti-tubercular drugs and genes to provide a more accurate phenotypic prediction. We use t-distributed Stochastic Neighbor Embedding (t-SNE) visualization and feature importance analyses to examine inter-drug similarities. Deep learning has a clear role in improving drug resistance predictive performance over traditional methods and holds promise in bringing sequencing technologies closer to the bedside.
https://www.biorxiv.org/content/early/2018/03/03/275628