greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

The role of different sampling methods in improving biological activity prediction using deep belief network #179

Closed agitter closed 7 years ago

agitter commented 7 years ago

http://doi.org/10.1002/jcc.24671

Thousands of molecules and descriptors are available for a medicinal chemist thanks to the technological advancements in different branches of chemistry. This fact as well as the correlation between them has raised new problems in quantitative structure activity relationship studies. Proper parameter initialization in statistical modeling has merged as another challenge in recent years. Random selection of parameters leads to poor performance of deep neural network (DNN). In this research, deep belief network (DBN) was applied to initialize DNNs. DBN is composed of some stacks of restricted Boltzmann machine, an energy-based method that requires computing log likelihood gradient for all samples. Three different sampling approaches were suggested to solve this gradient. In this respect, the impact of DBN was applied based on the different sampling approaches mentioned above to initialize the DNN architecture in predicting biological activity of all fifteen Kaggle targets that contain more than 70k molecules. The same as other fields of processing research, the outputs of these models demonstrated significant superiority to that of DNN with random parameters.

Virtual screening (#45) on the Merck Kaggle dataset

agitter commented 7 years ago

This paper studies the Merck Kaggle dataset, as in #54 and #57. The main contribution is a comparison of a neural network initialized with Gaussian weights versus networks initialized with a deep belief network (DBN). They assess different sampling strategies for the DBN. DBN initialization does improve predictive performance over their baseline neural network. However, I don't think this adds anything relevant for the objectives of our review if we already discuss #54 and #57.