simonfqy / PADME

This is the repository containing the source code for my Master's thesis research, about predicting drug-target interaction using deep learning.
MIT License
41 stars 16 forks source link

Training data creation #14

Open abhisekbakshi opened 4 years ago

abhisekbakshi commented 4 years ago

Dear Sir, I could not understand how the SMILES format is converted to ECFP to feed in the input layer of the model. Moreover, I could not understand how you have calculated the known binding affinity score for training samples. Please suggest me a way to understand these.

simonfqy commented 4 years ago

The SMILES format is converted to RDKit Mol object and then converted to ECFP (in this case, Morgan Fingerprint, which is nearly identical to ECFP) in this line: https://github.com/simonfqy/PADME/blob/e01c592cc06c4de04b3ed6db35da5af5ff7f863f/dcCustom/feat/fingerprints.py#L23. As for the binding affinity scores, I obtained the info from some publicly available datasets. They are then processed in thepreprocess.py files in each dataset folder, like here: https://github.com/simonfqy/PADME/blob/e01c592cc06c4de04b3ed6db35da5af5ff7f863f/davis_data/preprocess.py#L35. The log transformation is done in the same file.