kexinhuang12345 / DeepPurpose

A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)
https://doi.org/10.1093/bioinformatics/btaa1005
BSD 3-Clause "New" or "Revised" License
959 stars 273 forks source link

Query Regarding MPNN Drug Encoder #163

Open SudhirGhandikota opened 1 year ago

SudhirGhandikota commented 1 year ago

Hello,

Before I ask my query, I would like to congratulate everybody involved in building this great framework. I was trying to train an MPNN-CNN model on the latest BindingDB data and while exploring the existing MPNN Drug encoder implementation, I ran into a couple of doubts/queries. I was hoping to get some clarification from you guys.

From what I understand, the agraph property associated with a drug is basically an adjacency list where the row_indexrepresents an atom and each column value is the in_bond number/index. I deduced this from the following code snippet in the smiles2mpnnfeature method.

for a in range(n_atoms):
            for i,b in enumerate(in_bonds[a]):
                agraph[a,i] = b

Then, during the MPNN model training, I can see that the agraphs associated with all drugs in a given batch are combined into one combined agraph_1st variable. In this consolidation step, you used two variables N_a and N_b to maintain the running sum of atoms and bonds from each drug. Also, to re-index this combined adjacency list, I can see that you are adding the N_a value at each step (code snippet below).

for i in range(N_atoms_bond.shape[0]):
            ....
            agraph_lst.append(agraph[i,:atom_num,:] + N_a)

However, since the values in these adjacency lists are bond numbers or indices, shouldn't we be adding the N_b variable value instead? Could you please clarify this doubt of mine?

Thanks in advance!

futianfan commented 1 year ago

agraph_lst save the index of atom. N_a is the number of total atoms before the current molecule. so we need to add the value. it is not related to the bond number.

SudhirGhandikota commented 1 year ago

Hello,

Thanks for your response. However, the agraph_1st variable is populated using the agraph value, which seems to contain the bond number right (based on the first code snippet)??

futianfan commented 1 year ago

not bond number. but atom number. thanks