greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Introduction section and code climate issues #128

Closed traversc closed 7 years ago

traversc commented 7 years ago

Hi everyone, I filled in the sub-section in the introduction titled "Types of biological problems". I briefly describe the three classes of topics (categorize, study, treat) and discuss the typical questions and approaches with a few examples from the literature.

Secondly, when I attempt to create a pull request, the "code climate" review keeps failing for various reasons. Is there a way to preview check with code climate? I don't want to spam the pull request system just to make sure it is in the correct format.

Thank you.

Types of biological problems

In this review, we are interested in the application of deep learning to topics of biomedical importance. This covers a large range of biomedical topics, which we divide into three broad classes based on their applied areas. We then briefly introduce the types of questions, approaches and data which are typical for each class in the application of deep learning.

Disease and Patient Categorization

One important topic in the biomedical field is the accurate classify diseases and disease subtypes. In the oncology field, current "gold standard" approaches are limited to either histological approaches, requiring manual human expertise, or shallow molecular markers, such as the cell surface receptors or small panels of genes. One example is the current PAM50 approach in classifying breast cancer, which utilizes the expression of 50 marker genes in order to divide breast cancer patients into six subtypes. Significant heterogeneity still remains within these six subtypes (Mayer et al. 2014)[@doi:10.1158/1078-0432.CCR-13-0583]. Given the increasing wealth of molecular data available, it seems that a more comprehensive subtyping is possible.

Several studies have used deep learning methods in order to better categorize breast cancer patients. For example, Tan et al. applied denoising autoencoders (DA), an unsupervised approach, in order to cluster breast cancer patients (Tan et al. 2014)[@doi:10.1142/9789814644730_0014]. Ciresan et al. utilized convolutional neural networks (CNN) to count mitotic divisions in histological images; a feature which is highly correlated with disease outcome (Ciresan et al. 2013)[@doi:10.1007/978-3-642-40763-5_51]. Despite these recent advances, a number of challenges exist in this area of research, such as the integration of disparate types of data, including electronic health records (EHR), imaging and histology data and molecular omics data.

Fundamental Biological Study

Broadly speaking, topics in this class aim to answer more fundamental biological questions. Deep learning is especially suited in leveraging the large amounts of data from high throughput omics studies. The development of deep learning techniques and complex network architectures allow researchers to answer fundamental biological questions with unprecedented accuracy. One classic biological problem where machine learning has been extensively applied is the prediction of molecular targets. Recent advances using deep learning have shown higher accuracy in determining molecular targets. For example, Lee et al. used deep recurrent neural networks (RNN) to predict gene targets of micro-RNAs (Lee et al. 2016)[@doi:10.1109/icnn.1994.374637]. Wang et al. used a residual CNN to predict protein-protein contact on a genome-wide scale (Wang et al. 2016)[@doi:10.1101/073239]. Other biological questions that have been investigated include the prediction of protein secondary structure based on sequence data (Spencer et al. 2015, Lin et al. 2016)[@doi:10.1109/tcbb.2014.2343960,@doi:10.1038/srep18962,@doi:10.1038/srep18 962], recognition of functional genomic elements such as enhancers and promoters (Liu et al. 2016, Li et al. 2015, Kleftogiannis et al. 2014)[@doi:10.1101/036129,@doi:10.1007/978-3-319-16706-0_20,@doi:10.1093/nar/gk u1058], or predicting the deleterious effects of nucleotide polymorphisms (Quang et al. 2014)[@doi:10.1093/bioinformatics/btu703], etc.

Treatment Selection

Studies in this category aim to recommend patient treatment or predict treatment outcome. Specifically, a lot of effort in this area aims to identify drug targets, identify drug interactions or predict drug response. One recent approach for predicting drug response is the use of protein structure to predict drug interactions and drug bioactivity through CNN (Wallach et al. 2015)[@arXiv:1510.02855]. Since CNNs leverage spatial relationships within the data, this particular deep learning framework is well suited to the problem. Drug discovery and drug "repurposing" is another hot topic. Aliper et al. used transcriptomic data to predict which drugs might be repurposed for other diseases through deep fully connected neural networks. In a similar vein, Wang et al. used restricted boltzman machines (RBM) to predict drug molecular targets (Wang et al. 2013)[@doi:10.1093/bioinformatics/btt234].

cgreene commented 7 years ago

With regards to code climate you can install it locally if you want to. The configuration in this repository should return the same results. We are using it as a spot check more than a rule. Code Climate can fail and that's ok - as long as what it's suggesting isn't helpful.

On Mon, Oct 31, 2016, 8:22 PM traversc notifications@github.com wrote:

Hi everyone, I filled in the sub-section in the introduction titled "Types of biological problems". I briefly describe the three classes of topics (categorize, study, treat) and discuss the typical questions and approaches with a few examples from the literature.

Secondly, when I attempt to create a pull request, the "code climate" review keeps failing for various reasons. Is there a way to preview check with code climate? I don't want to spam the pull request system just to make sure it is in the correct format.

Thank you. Types of biological problems

In this review, we are interested in the application of deep learning to topics of biomedical importance. This covers a large range of biomedical topics, which we divide into three broad classes based on their applied areas. We then briefly introduce the types of questions, approaches and data which are typical for each class in the application of deep learning. Disease and Patient Categorization

One important topic in the biomedical field is the accurate classify diseases and disease subtypes. In the oncology field, current "gold standard" approaches are limited to either histological approaches, requiring manual human expertise, or shallow molecular markers, such as the cell surface receptors or small panels of genes. One example is the current PAM50 approach in classifying breast cancer, which utilizes the expression of 50 marker genes in order to divide breast cancer patients into six subtypes. Significant heterogeneity still remains within these six subtypes (Mayer et al. 2014)[@doi https://github.com/doi:10.1158/1078-0432.CCR-13-0583]. Given the increasing wealth of molecular data available, it seems that a more comprehensive subtyping is possible.

Several studies have used deep learning methods in order to better categorize breast cancer patients. For example, Tan et al. applied denoising autoencoders (DA), an unsupervised approach, in order to cluster breast cancer patients (Tan et al. 2014)[@doi https://github.com/doi:10.1142/9789814644730_0014]. Ciresan et al. utilized convolutional neural networks (CNN) to count mitotic divisions in histological images; a feature which is highly correlated with disease outcome (Ciresan et al. 2013)[@doi https://github.com/doi :10.1007/978-3-642-40763-5_51]. Despite these recent advances, a number of challenges exist in this area of research, such as the integration of disparate types of data, including electronic health records (EHR), imaging and histology data and molecular omics data. Fundamental Biological Study

Broadly speaking, topics in this class aim to answer more fundamental biological questions. Deep learning is especially suited in leveraging the large amounts of data from high throughput omics studies. The development of deep learning techniques and complex network architectures allow researchers to answer fundamental biological questions with unprecedented accuracy. One classic biological problem where machine learning has been extensively applied is the prediction of molecular targets. Recent advances using deep learning have shown higher accuracy in determining molecular targets. For example, Lee et al. used deep recurrent neural networks (RNN) to predict gene targets of micro-RNAs (Lee et al. 2016)[@doi https://github.com/doi:10.1109/icnn.1994.374637]. Wang et al. used a residual CNN to predict protein-protein contact on a genome-wide scale (Wang et al. 2016)[@doi https://github.com/doi:10.1101/073239]. Other biological questions that have been investigated include the prediction of protein secondary structure based on sequence data (Spencer et al. 2015, Lin et al. 2016)[@doi https://github.com/doi:10.1109/tcbb.2014.2343960,@doi https://github.com/doi:10.1038/srep18962,@doi https://github.com/doi :10.1038/srep18 962], recognition of functional genomic elements such as enhancers and promoters (Liu et al. 2016, Li et al. 2015, Kleftogiannis et al. 2014)[@doi https://github.com/doi:10.1101/036129,@doi https://github.com/doi:10.1007/978-3-319-16706-0_20,@doi https://github.com/doi:10.1093/nar/gk u1058], or predicting the deleterious effects of nucleotide polymorphisms (Quang et al. 2014)[@doi https://github.com/doi:10.1093/bioinformatics/btu703], etc. Treatment Selection

Studies in this category aim to recommend patient treatment or predict treatment outcome. Specifically, a lot of effort in this area aims to identify drug targets, identify drug interactions or predict drug response. One recent approach for predicting drug response is the use of protein structure to predict drug interactions and drug bioactivity through CNN (Wallach et al. 2015)[@arXiv https://github.com/arXiv:1510.02855]. Since CNNs leverage spatial relationships within the data, this particular deep learning framework is well suited to the problem. Drug discovery and drug "repurposing" is another hot topic. Aliper et al. used transcriptomic data to predict which drugs might be repurposed for other diseases through deep fully connected neural networks. In a similar vein, Wang et al. used restricted boltzman machines (RBM) to predict drug molecular targets (Wang et al. 2013)[@doi https://github.com/doi:10.1093/bioinformatics/btt234].

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/128, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhHs7mDPYkAw4pRNl1RC1yzWcMwzT-Mks5q5oYvgaJpZM4Klqhv .

cgreene commented 7 years ago

Also - if you commit to the same branch, it will update the pull request instead of creating many new ones.

On Mon, Oct 31, 2016, 9:38 PM Casey Greene csgreene@mail.med.upenn.edu wrote:

With regards to code climate you can install it locally if you want to. The configuration in this repository should return the same results. We are using it as a spot check more than a rule. Code Climate can fail and that's ok - as long as what it's suggesting isn't helpful.

On Mon, Oct 31, 2016, 8:22 PM traversc notifications@github.com wrote:

Hi everyone, I filled in the sub-section in the introduction titled "Types of biological problems". I briefly describe the three classes of topics (categorize, study, treat) and discuss the typical questions and approaches with a few examples from the literature.

Secondly, when I attempt to create a pull request, the "code climate" review keeps failing for various reasons. Is there a way to preview check with code climate? I don't want to spam the pull request system just to make sure it is in the correct format.

Thank you. Types of biological problems

In this review, we are interested in the application of deep learning to topics of biomedical importance. This covers a large range of biomedical topics, which we divide into three broad classes based on their applied areas. We then briefly introduce the types of questions, approaches and data which are typical for each class in the application of deep learning. Disease and Patient Categorization

One important topic in the biomedical field is the accurate classify diseases and disease subtypes. In the oncology field, current "gold standard" approaches are limited to either histological approaches, requiring manual human expertise, or shallow molecular markers, such as the cell surface receptors or small panels of genes. One example is the current PAM50 approach in classifying breast cancer, which utilizes the expression of 50 marker genes in order to divide breast cancer patients into six subtypes. Significant heterogeneity still remains within these six subtypes (Mayer et al. 2014)[@doi https://github.com/doi:10.1158/1078-0432.CCR-13-0583]. Given the increasing wealth of molecular data available, it seems that a more comprehensive subtyping is possible.

Several studies have used deep learning methods in order to better categorize breast cancer patients. For example, Tan et al. applied denoising autoencoders (DA), an unsupervised approach, in order to cluster breast cancer patients (Tan et al. 2014)[@doi https://github.com/doi:10.1142/9789814644730_0014]. Ciresan et al. utilized convolutional neural networks (CNN) to count mitotic divisions in histological images; a feature which is highly correlated with disease outcome (Ciresan et al. 2013)[@doi https://github.com/doi :10.1007/978-3-642-40763-5_51]. Despite these recent advances, a number of challenges exist in this area of research, such as the integration of disparate types of data, including electronic health records (EHR), imaging and histology data and molecular omics data. Fundamental Biological Study

Broadly speaking, topics in this class aim to answer more fundamental biological questions. Deep learning is especially suited in leveraging the large amounts of data from high throughput omics studies. The development of deep learning techniques and complex network architectures allow researchers to answer fundamental biological questions with unprecedented accuracy. One classic biological problem where machine learning has been extensively applied is the prediction of molecular targets. Recent advances using deep learning have shown higher accuracy in determining molecular targets. For example, Lee et al. used deep recurrent neural networks (RNN) to predict gene targets of micro-RNAs (Lee et al. 2016)[@doi https://github.com/doi:10.1109/icnn.1994.374637]. Wang et al. used a residual CNN to predict protein-protein contact on a genome-wide scale (Wang et al. 2016)[@doi https://github.com/doi:10.1101/073239]. Other biological questions that have been investigated include the prediction of protein secondary structure based on sequence data (Spencer et al. 2015, Lin et al. 2016)[@doi https://github.com/doi:10.1109/tcbb.2014.2343960,@doi https://github.com/doi:10.1038/srep18962,@doi https://github.com/doi :10.1038/srep18 962], recognition of functional genomic elements such as enhancers and promoters (Liu et al. 2016, Li et al. 2015, Kleftogiannis et al. 2014)[@doi https://github.com/doi:10.1101/036129,@doi https://github.com/doi:10.1007/978-3-319-16706-0_20,@doi https://github.com/doi:10.1093/nar/gk u1058], or predicting the deleterious effects of nucleotide polymorphisms (Quang et al. 2014)[@doi https://github.com/doi:10.1093/bioinformatics/btu703], etc. Treatment Selection

Studies in this category aim to recommend patient treatment or predict treatment outcome. Specifically, a lot of effort in this area aims to identify drug targets, identify drug interactions or predict drug response. One recent approach for predicting drug response is the use of protein structure to predict drug interactions and drug bioactivity through CNN (Wallach et al. 2015)[@arXiv https://github.com/arXiv:1510.02855]. Since CNNs leverage spatial relationships within the data, this particular deep learning framework is well suited to the problem. Drug discovery and drug "repurposing" is another hot topic. Aliper et al. used transcriptomic data to predict which drugs might be repurposed for other diseases through deep fully connected neural networks. In a similar vein, Wang et al. used restricted boltzman machines (RBM) to predict drug molecular targets (Wang et al. 2013)[@doi https://github.com/doi:10.1093/bioinformatics/btt234].

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/128, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhHs7mDPYkAw4pRNl1RC1yzWcMwzT-Mks5q5oYvgaJpZM4Klqhv .

traversc commented 7 years ago

OK. Then please check out the latest pull request and if it's what you guys had in mind. Thanks!

agitter commented 7 years ago

I'm closing this issue and #124, #125, and #126. We can take the discussion to #127.