greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.24k stars 272 forks source link

Deep Reinforcement Learning for De-Novo Drug Design #731

Open agitter opened 6 years ago

agitter commented 6 years ago

Published version: http://advances.sciencemag.org/content/4/7/eaap7885 Pre-print: https://arxiv.org/abs/1711.10907

We propose a novel computational strategy based on deep and reinforcement learning techniques for de-novo design of molecules with desired properties. This strategy integrates two deep neural networks -generative and predictive - that are trained separately but employed jointly to generate novel chemical structures with the desired properties. Generative models are trained to produce chemically feasible SMILES, and predictive models are derived to forecast the desired compound properties. In the first phase of the method, generative and predictive models are separately trained with supervised learning algorithms. In the second phase, both models are trained jointly with reinforcement learning approach to bias newly generated chemical structures towards those with desired physical and biological properties. In this proof-of-concept study, we have employed this integrative strategy to design chemical libraries biased toward compounds with either maximal, minimal, or specific range of physical properties, such as melting point and hydrophobicity, as well as to develop novel putative inhibitors of JAK2. This new approach can find a general use for generating targeted chemical libraries optimized for a single desired property or multiple properties.

stephenra commented 5 years ago

Updated with published version. Code is here: https://github.com/isayev/ReLeaSE

jgmeyerucsd commented 5 years ago

Not sure I'm doing this correctly, but reading through this paper now and wanted to try contributing by sharing my summary and thoughts.

Overview

Popova et al. describe a strategy for in silico design and optimization of drugs toward arbitrary molecular properties that connects two deep neural networks with reinforcement learning (RL). Both of the deep neural networks are trained only on SMILES molecular representations, which is critical for integrating the de novo structure generation and property prediction into a single RL system.

Computational Methods

Two deep neural networks are used: one generative network (G) and one predictive network (P).

  1. G is a specific type of generative RNN, a stack-RNN [@arxiv:1503.01007], which was chosen because generation of valid SMILES requires the ability to count rings, valence electrons, and check for multiple bracket opening and closings. LSTM and GRU architectures cannot count and are therefore unsuitable for generating SMILES. Although the stack-RNN is a logically sound choice for this task, Neural Turing Machines are also theoretically a sound choice.

    • G was trained on 1.5 million drug-like molecules from ChEMBL21
    • Training molecules were required to have SMILES length <100 characters
    • The network had 1500 units in a GRU layer and and 512 units in a stack augmentation layer.
    • The model was trained over 10,000 epochs on a GPU
  2. P is a deep neural network with (2) one embedding layer, (2) one LSTM layer, and (3) two dense layers. P is designed to compute arbitrary molecule properties, i.e. can be trained to map SMILES to any molecular property. P is unique compared to other quantitative structure-activity relationship computation methods in that it requires no numerical descriptor of the input molecule. P was trained here to predict three different properties: Tm (melting temperature), logP, and pIC50.

    • The embedding layer used converted each SMILES symbol into a vector of 100 continuous numbers
    • LSTM layer had 100 units and used tanh nonlinear activation.
    • Dense layer 1 used 100 units and relu activation.
    • Dense layer 2 used one unit and identity activation.
    • P was trained using 5-fold cross validation, but it is unclear how many examples were present in the training sets.
  3. RL formulation - The problem of generating chemical compounds with desired properties can be formulated as a task of finding a vector of parameters of a policy network that aims to maximize expected reward.

    • After training G and P separately and independently, they are combined into a single RL system.
    • Training occurs as a series of actions along that build the SMILES string. At each training cycle:
      1. G estimates probabilities for the next letter in the SMILES alphabet using the previous state as input.
      2. The next action is sampled from the probability distribution.
      3. Reward is computed as a function of the predicted property from P
    • G is trained during RL to maximize the expected reward, where the reward function was altered depending on the desired property.

Results

Generation of Molecules

G was trained on 1.5 million molecules from Chembl21, and then used to generate a set of 1 million molecules de-novo. Of the 1M generated SMILES, 95% of the molecules produced by G were valid structures according to ChemAxon. Less than 0.1% of the SMILES produced were present in the training set. Strikingly, only 3% of the 1 million SMILES generated by G were found in the ZINC database. The authors demonstrate the value of using stack-augmented RNN by training another RNN without this feature, and showed that the stack-RNN produced a distribution of less similar molecules, with repeats of identical molecules. Finally, they determine that although the molecules generated by their stack-RNN were very unique compared to other databases, these molecules were synthetically approachable according to the synthetic accessibility score (SAS) [@doi:10.1186/1758-2946-1-8]. Taken together, it seems that this model certainly does not simply memorize the input data, and can be used to produce novel chemical diversity. However, production of invalid structures 1/20 times still seems high, and there is room to explore how invalid structures might be eliminated or disfavored during training.

Property Prediction

P was trained and found to have excellent accuracy compared to a random forest in predicting logP. The model also had excellent ability at predicting Tm.

RL for maximization or minimization of desired properties

Three case studies using the complete RL system were presented to simulate real-world drug design needs, optimization of: (1) physical properties (2) biological activity, and (3) chemical complexity. Compared to the training set, the RL system was able to decrease Tm of the library by 44 degrees Celsius, or increase the Tm by 20-200 degrees Celsius, using minimization or maximization, respectively. Unfortunately, the minimization or maximization of Tm drastically dropped the % of valid molecules to 31% and 53%, respectively. The RL system's ability to range-optimize logP was also quite effective, with 88% of SMILES produced by G falling within the desired range of logP; however, the proportion of valid SMILES fell again (70% valid from the 95% baseline). Finally, the results from learning to maximize inhibition of JAK2 were most remarkable, where the model generated compounds previously annotated as tyrosine kinase inhibitors.

Model Analysis

A common criticism of deep neural networks is the inability to interpret their parameters or understand what they've learned. This paper does a good job of assessing and visualizing how specific GRU neuron gates contribute to the SMILES generation.

Summary

The unique system of a generative and predictive neural network linked within a reward learning cycle is a substantial innovation towards the goal of automating drug design and testing.

Cons

One issue with the paper is a lack of experimental validation in their predicted properties. Although they claim to have optimized properties, we cannot be sure that P accurately predicted the properties.

Another shortcoming of this work, as with most other deep learning models for chemical structure generation or encoding, is an imperfect production of valid chemical structures. One possible solution within this RL system is to penalize generation of invalid SMILES. Maybe it is impossible to achieve 100% valid structures because latent space is continuous and molecules are not?