topazape / LSTM_Chem

Implementation of the paper - Generative Recurrent Networks for De Novo Drug Design.
The Unlicense
116 stars 55 forks source link

where to get finetune's corpors? #1

Closed soulbyfeng closed 5 years ago

soulbyfeng commented 5 years ago

in your project ,i can t see file to finetune,how to get ? if you have ,cao you give me a file or tell me how to get it ,thx.

topazape commented 5 years ago

I'm sorry to reply late.

When fine-tune turn, you need to put your favorite SMILES set into ./dataset/ directory. (In this context, your SMILES set named your-favorite-smiles.smi.) And edit line like: "finetune_data_filename": "./datasets/your-favorite-smiles.smi", in ./configs/LSTMChem_config.json. Finally, run finetune.py, you get newly generated SMILES in WantsChems.smi.

If you don't have relevant SMILES set for fine-tune, please get wet data in your organization. Or, please remake ./datasets/datasets.smi, because train dataset chemicals in datasets.smi were extracted from whole ChEMBL DataBase on condition that chemical shows IC50, EC50, Ki < nM.

For example, if you want to generate new kinase inhibitor, care must be taken you may avoid kinase inhibitor when you extract from ChEMBL DataBase. And make extracted SMILES set to train dataset and make avoided kinase inhibitor SMILES set for test set (fine-tune set). Perhaps, you may not need this steps. Train NN ./datasets/datasets.smi of this repo and fin-tune kinase inhibitor SMILES that is extracted from ChEMBL DB.

If you are not familiar with handling ChEMBL DB, this article is useful.

Dharmogata commented 5 years ago

Hello, topazape.

Could you please share the file named 'LSTM_Chem-22-0.42.hdf5'?

Thank you in advance!