Open agitter opened 6 years ago
Updated with published version. Code is here: https://github.com/isayev/ReLeaSE
Not sure I'm doing this correctly, but reading through this paper now and wanted to try contributing by sharing my summary and thoughts.
Popova et al. describe a strategy for in silico design and optimization of drugs toward arbitrary molecular properties that connects two deep neural networks with reinforcement learning (RL). Both of the deep neural networks are trained only on SMILES molecular representations, which is critical for integrating the de novo structure generation and property prediction into a single RL system.
Two deep neural networks are used: one generative network (G) and one predictive network (P).
G is a specific type of generative RNN, a stack-RNN [@arxiv:1503.01007], which was chosen because generation of valid SMILES requires the ability to count rings, valence electrons, and check for multiple bracket opening and closings. LSTM and GRU architectures cannot count and are therefore unsuitable for generating SMILES. Although the stack-RNN is a logically sound choice for this task, Neural Turing Machines are also theoretically a sound choice.
P is a deep neural network with (2) one embedding layer, (2) one LSTM layer, and (3) two dense layers. P is designed to compute arbitrary molecule properties, i.e. can be trained to map SMILES to any molecular property. P is unique compared to other quantitative structure-activity relationship computation methods in that it requires no numerical descriptor of the input molecule. P was trained here to predict three different properties: Tm (melting temperature), logP, and pIC50.
RL formulation - The problem of generating chemical compounds with desired properties can be formulated as a task of finding a vector of parameters of a policy network that aims to maximize expected reward.
G was trained on 1.5 million molecules from Chembl21, and then used to generate a set of 1 million molecules de-novo. Of the 1M generated SMILES, 95% of the molecules produced by G were valid structures according to ChemAxon. Less than 0.1% of the SMILES produced were present in the training set. Strikingly, only 3% of the 1 million SMILES generated by G were found in the ZINC database. The authors demonstrate the value of using stack-augmented RNN by training another RNN without this feature, and showed that the stack-RNN produced a distribution of less similar molecules, with repeats of identical molecules. Finally, they determine that although the molecules generated by their stack-RNN were very unique compared to other databases, these molecules were synthetically approachable according to the synthetic accessibility score (SAS) [@doi:10.1186/1758-2946-1-8]. Taken together, it seems that this model certainly does not simply memorize the input data, and can be used to produce novel chemical diversity. However, production of invalid structures 1/20 times still seems high, and there is room to explore how invalid structures might be eliminated or disfavored during training.
P was trained and found to have excellent accuracy compared to a random forest in predicting logP. The model also had excellent ability at predicting Tm.
Three case studies using the complete RL system were presented to simulate real-world drug design needs, optimization of: (1) physical properties (2) biological activity, and (3) chemical complexity. Compared to the training set, the RL system was able to decrease Tm of the library by 44 degrees Celsius, or increase the Tm by 20-200 degrees Celsius, using minimization or maximization, respectively. Unfortunately, the minimization or maximization of Tm drastically dropped the % of valid molecules to 31% and 53%, respectively. The RL system's ability to range-optimize logP was also quite effective, with 88% of SMILES produced by G falling within the desired range of logP; however, the proportion of valid SMILES fell again (70% valid from the 95% baseline). Finally, the results from learning to maximize inhibition of JAK2 were most remarkable, where the model generated compounds previously annotated as tyrosine kinase inhibitors.
A common criticism of deep neural networks is the inability to interpret their parameters or understand what they've learned. This paper does a good job of assessing and visualizing how specific GRU neuron gates contribute to the SMILES generation.
The unique system of a generative and predictive neural network linked within a reward learning cycle is a substantial innovation towards the goal of automating drug design and testing.
One issue with the paper is a lack of experimental validation in their predicted properties. Although they claim to have optimized properties, we cannot be sure that P accurately predicted the properties.
Another shortcoming of this work, as with most other deep learning models for chemical structure generation or encoding, is an imperfect production of valid chemical structures. One possible solution within this RL system is to penalize generation of invalid SMILES. Maybe it is impossible to achieve 100% valid structures because latent space is continuous and molecules are not?
Published version: http://advances.sciencemag.org/content/4/7/eaap7885 Pre-print: https://arxiv.org/abs/1711.10907