Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

agitter commented 7 years ago

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Genomic Dashboard (Deep GDashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output values, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

@akundaje commented that using gradients in this manner will miss important patterns relative to #50.

akundaje commented 7 years ago

They are using saliency maps (gradients) from Simonyan et al. with some simple but interesting extensions for RNNs and CNN-RNNs. Nice paper. But biggest drawback as we show in the Deeplift paper is that using gradients can miss very clear predictive patterns in the sequences.

On Aug 20, 2016 4:46 AM, "Anthony Gitter" notifications@github.com wrote:

http://arxiv.org/abs/1608.03644

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Genomic Dashboard (Deep GDashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output values, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

@akundaje https://github.com/akundaje commented that using gradients in this manner will miss important patterns relative to #50 https://github.com/greenelab/deep-review/issues/50.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/85, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7ETgdyIdcPJr_DLhAC6i5WpqYIKPIks5qhuj5gaJpZM4JpDwE .

agitter commented 7 years ago

Published at PSB 2017 http://doi.org/10.1142/9789813207813_0025

greenelab / deep-review

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks #85