topazape / LSTM_Chem

Implementation of the paper - Generative Recurrent Networks for De Novo Drug Design.
The Unlicense
116 stars 55 forks source link

[Errno 2] No such file or directory: './datasets/fine-tune.smi' #2

Closed soulbyfeng closed 5 years ago

soulbyfeng commented 5 years ago

[Errno 2] No such file or directory: './datasets/fine-tune.smi'

topazape commented 5 years ago

I'm sorry to reply late.

When fine-tune turn, you need to put your favorite SMILES set into ./dataset/ directory. (In this context, your SMILES set named your-favorite-smiles.smi.) And edit line like: "finetune_data_filename": "./datasets/your-favorite-smiles.smi", in ./configs/LSTMChem_config.json. Finally, run finetune.py, you get newly generated SMILES in WantsChems.smi.

If you don't have relevant SMILES set for fine-tune, please get wet data in your organization. Or, please remake ./datasets/datasets.smi, because train dataset chemicals in datasets.smi were extracted from whole ChEMBL DataBase on condition that chemical shows IC50, EC50, Ki < nM.

For example, if you want to generate new kinase inhibitor, care must be taken you may avoid kinase inhibitor when you extract from ChEMBL DataBase. And make extracted SMILES set to train dataset and make avoided kinase inhibitor SMILES set for test set (fine-tune set). Perhaps, you may not need this steps. Train NN ./datasets/datasets.smi of this repo and fin-tune kinase inhibitor SMILES that is extracted from ChEMBL DB.

If you are not familiar with handling ChEMBL DB, this article is useful.