greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships #54

Closed agitter closed 6 years ago

agitter commented 8 years ago

http://doi.org/10.1021/ci500747n

Related to virtual screening #45.

kumardeep27 commented 8 years ago

Deep neural networks (DNNs) are compared with random forest (RF) methods on Merck's drug discovery effort’s QSAR data sets. This paper is more methodological, in the sense, to explore the DNN’s capabilities to handle large datasets and comparison with the performance as compared to RF. Authors found that adjustable parameters are very large, but not necessarily required for optimization. Paper finally come up with single set of parameters better than RF in most of the datasets. In cheminformatics, large compounds with large features, sparse features, many models, so computationally intensive. Computational load of DNNs can be compensated by the use of GPUs. Methods: 15 datasets from Merck's repository (containing 2k-50k molecules) were taken as first data set. Each data set was divided into train and test dataset. 15 Kaggle data sets having on-target and ADME activities were used to make models, while 15 Additional data sets were used to validate the models. The data sets used were realistically large and treated with time-split validation, however the full identity of Kaggle compounds is not available. The “time-split” validation means that for each data set, the first 75% of the molecules assayed for the particular activity form the training set, while the remaining 25% of the compounds assayed later form the test set. Descriptors: 2 kind of descriptors/features were used- atom pairs (AP) and donor-acceptor pair (DP/BP). Main aim is to compare highly successful RF with DNN. They used parallelized RF code with fixed parameters according to selection criteria. The possibility of overfitting during Back propagation step in DNN is taken care by the dropout training which includes employing joint DNNs taking all the 15 dataset in one go so that hidden layers can learn from QSAR features of other layers and eventually the output is simultaneously modelled as 15 QSAR models. They also compare the joint DNN with single data set DNN. They also investigate the impact of tuning DNN parameters. After setting at least 71 parameters for each dataset, they found that it is wise to keep the parameter smaller in number and which could work on all 15 datasets, consistently good. DNN is used in python environment and GPU hardware. Evaluation parameter is R2 for the test set. They tried a range of parameters for DNN when comparing to RF. They found that 11/15 DNN outperformed RF. Average improvement of r2 is 10% for DNN over RF. Table 2 shows the comparison of DNN with RF. Paper also discussed the empirical guidelines for the selection of parameters for DNN and the corresponding performance. In most of the cases, the performance was better when ReLU was used as activation function rather than sigmoid. Comparison of joint DNN vs single dataset DNN is subjective and is a matter of further investigation as it depends on the data set size. Pretraining on an average have rendered lower performance in QSAR datasets. Some brief conclusions after different combinations of parameters lead to the following points: Logarithmic transformation, >=2 hidden layers, >=250 nodes, ReLU as activation function, size of initial hidden layers should be larger as compared to latter layers, no dropouts at input layer is needed, rather latter layers may have fewer dropouts. Finally the paper comes up with suggestive guidelines regarding the architecture of DNNs for QSAR study to get maximum benefit. Table 3 shows the additional/new dataset results using the best set of parameters selected on first pool of Kaggle data sets. It shows that new set of parameters work well to outperform RF. Paper helps to identify key parameters to get maximum performance.

agitter commented 8 years ago

@kumardeep27 thanks a lot for the summary.

This seems to be similar to #57 in that it is a follow up from the Kaggle competition, except this is from the Merck perspective instead of the competitors' perspective. It is very good work, but I don't see anything else we need to discuss here that hasn't already been covered in the other multitask virtual screening discussions. There is a nice presentation of the impact of hyperparameters, including the assessment of whether the optimal hyperparameters for the 15 Kaggle targets still work well on the set of 15 new targets.

The time-split validation approach is interesting. I've only seen this discussed in the virtual screening domain (e.g. the ADMET paper I linked in #55), which might make sense because the screening is done in batches.

There is also this quote showing that Merck thought the ensemble results, which included the neural network and other methods, were impressive:

The winning entry (submitted by one of the authors, George Dahl) improved the mean R2 averaged over the 15 data sets from 0.42 (for RF) to 0.49. While the improvement might not seem large, we have seldom seen any method in the past 10 years that could consistently outperform RF by such a margin, so we felt this was an interesting result.