Reinforcement learning - Githubissues

albertma-evotec commented 4 years ago

First of all I am sorry for following question being a bit irrelevant to this project but I am looking for some guidance and jump start.

I am trying to add some code to apply reinforcement learning on the pre-trained BIMODAL model to do multi-parameter optimization. I am using the pretrained model under /evaluation/BIMODAL_random_512_FineTuning_template folder

I am basically trying to do something similar to the published REINVENT network https://arxiv.org/abs/1704.07555 (Olivercrona et. al., 2017) https://github.com/MarcusOlivecrona/REINVENT

but I am not sure where to start to define and update a proper loss function based on the reward/scoring function of the generated structures (e.g. maximizing a penalised logP and QED, etc.). I know my reward function, for simplicity, let's say it is the QED and I am trying to generate compounds that maximise this score.

For the moment, I just define my loss as the MSE loss: (I doubt it is the correct approach though, but I am not sure what else I can do. I am not even sure I should use MSE loss. Maybe I need to consider something like the logits and likelihoods?)

I re-trained the model for 200 steps but no improvement of the score/reward.

Would you mind giving me some guidance please? I know it is a long question but thank you very much in advance.

This is the my code to try to re-train the pre-trained model with reinforcement learning

import numpy as np
import pandas as pd
from bimodal import BIMODAL
from one_hot_encoder import SMILESEncoder
from sklearn.utils import shuffle
import os
from helper import clean_molecule, check_model, check_molecules
import torch
import torch.nn as nn
from bidir_lstm import BiDirLSTM
import rdkit
from rdkit import Chem
from rdkit.Chem import QED

class Trainer_rl():

    def __init__(self):
        self._encoder = SMILESEncoder()
        self._model_type = 'BIMODAL'
        self._model = BIMODAL(molecule_size=151, encoding_dim=55,
                          lr=0.001, hidden_units=128)
        self._start_model = "../evaluation/BIMODAL_random_512_FineTuning_template/pretrained_model"
        self._starting_token = self._encoder.encode('G')
        self._T = 0.7

    def score(self, mol):
        return Chem.QED.default(mol)

    def loss(self, values):
        ones = torch.ones(len(values), dtype=torch.float32)
        values = torch.tensor(values, requires_grad=True, dtype=torch.float32)
        #The maximum of the reward function is one so I use the torch.ones tensor to calculate my MSELoss
        loss = nn.MSELoss()
        loss = loss(values, ones)
        return loss

    def hyperparameter_update(self, decrease_by=0.1):
        for param_group in self._model._optimizer.param_groups:
            param_group["lr"] *= (1 - decrease_by)

    def train_agent(self, num_steps=1000, batch_size=64):

        # Load pre-trained model
        self._model.build(self._start_model)

        #Training loop
        for i in range(num_steps):
            self._model._optimizer.zero_grad()
            gen_SMILESs = []
            scores = []

            #This part basically generates batch of SMILES and calculate the scores of them.
            #score (QED) ranged from 0 to 1. if the generated molecule is invalid, I set it to -1.
            gen_SMILESs = [self._encoder.decode(self._model.sample(self._starting_token, self._T)) for x in range(batch_size)]
            clean_gen_SMILESs = [clean_molecule(s[0], self._model_type) for s in gen_SMILESs]
            scores = [self.score(Chem.MolFromSmiles(smi)) if Chem.MolFromSmiles(smi) else -1 for smi in clean_gen_SMILESs] 
            print(f"mean reward = {sum(rewards)/len(rewards)}")

            loss = self.loss(scores)
            print(f"Current loss: {loss}")

            #decrease learning rate by 10% every 10 steps (just for testing)
            if i % 10 == 0 and i != 0:
                self.hyperparameter_update()
            for param_group in self._model._optimizer.param_groups:
                print("Current learning rate {}".format(param_group["lr"]))

            loss.backward()
            self._model._optimizer.step()

if __name__ == "__main__":
    s = Trainer_rl()
    s.train_agent(num_steps=30, batch_size=10)

robinlingwood commented 4 years ago

Hi,

Pytorch keeps track of the gradients during the training and therefore we can use the backpropagation algorithm to optimize the weights (https://pytorch.org/docs/stable/optim.html). The model.sample method converts the pytorch tensor to a numpy array (model/bimodal.py line 271). To re-train the model, you have to directly work with the model._lstm output (see also model/bimodal.py line 215) and define your loss function using pytorch tensors.

I am not sure if it solves the problem entirely but it might give you some hints.

albertma-evotec commented 4 years ago

Thanks, I will try.

robinlingwood / BIMODAL

Reinforcement learning #2