Open mmagithub opened 7 months ago
It seems you may have used the classification subset from BindingDB curated by (Gao K, et al (2018) Interpretable drug target prediction using deep neural representation 3371–3377. https://doi.org/10.24963/ijcai.2018/468), please confirm if this is true, and if you have tested the method against regression problems as well.
Dear Marawan:
Thank you for your interest in the manuscript "A bidirectional interpretable compound-protein interaction prediction framework based on cross attention". You can access the CmhAttCPI pre-trained using the BindingDB dataset curated by us from the BindingDB database at https://github.com/wangmeng-code/wangmeng. You can directly use it for predictions on your dataset.
Regarding the dataset collection, indeed, the original data was larger than that we curated. This is because we organized and cleaned the raw data. We specifically chose human-related compound protein interactions (CPIs), and select KD values greater than 10,000 nm as negative samples and those less than 10,000 nm as positive samples. Due to the large volume of data, we utilized our workstation to process However, our workstation is currently undergoing maintenance, and access to the relevant records is unavailable.
Best Regards
At 2024-04-08 23:17:30, "mmagithub" @.***> wrote:
Hello, Nice repo, I am wondering if you can share pre-trained models that were trained on the full bindingdb that we can use directly for prediction. I am also wondering if you have done any curation/exclusion of the bindingDB records. The original number of records in the BindingDB (2.8M records) is significantly larger than the number of records you showed in the manuscript (~62k).
https://www.bindingdb.org/rwd/bind/index.jsp
Looking forward for your reply,
Thanks, Marawan
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Dear Marawan:
The BindingDB dataset in "A bidirectional interpretable compound-protein interaction prediction framework based on cross attention" was curated by us from the BindingDB database. We specifically chose human-related compound protein interactions (CPIs), and select KD values greater than 10,000 nm as negative samples and those less than 10,000 nm as positive samples. We did not use the dataset curated by (Gao K, et al (2018) Interpretable drug target prediction using deep neural representation 3371–3377. https://doi.org/10.24963/ijcai.2018/468)
Regards
在 2024-04-08 23:44:33,"mmagithub" @.***> 写道:
It seems you may have used the classification subset from BindingDB curated by (Gao K, et al (2018) Interpretable drug target prediction using deep neural representation 3371–3377. https://doi.org/10.24963/ijcai.2018/468), please confirm if this is true, and if you have tested the method against regression problems as well.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks so much for the answer and for providing the model.
A few more questions if you do not mind:
1 - When training the model, did you perform any preprocessing on the starting SMILES strings, such as uncharging, standardization, or canonicalization, or did you use the SMILES fields from BindingDB as-is?
2 - For the target protein sequences, did you use the raw sequences from UniProt, or did you select specific amino acid sequences corresponding to particular domains of interest, such as the catalytic domain of a kinase?
3 - is it possible to fine-tune ChemAttN on other datasets?
Dear Marawan:
Here are our answers to your questions:
Regards,
At 2024-04-10 02:37:28, "mmagithub" @.***> wrote:
A few more questions if you do not mind:
1 - When training the model, did you perform any preprocessing on the starting SMILES strings, such as uncharging, standardization, or canonicalization, or did you use the SMILES fields from BindingDB as-is?
2 - For the target protein sequences, did you use the raw sequences from UniProt, or did you select specific amino acid sequences corresponding to particular domains of interest, such as the catalytic domain of a kinase?
3 - is it possible to fine-tune ChemAttN on other datasets?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Thanks for you answers. I have tried the model you shared and I needed to change this line in the model.py script: from: self.pro_embed = nn.Embedding(self.N_word, self.embed_dim) To: self.pro_embed = nn.Embedding(num_embeddings=8313, embedding_dim=128)
Possibly the settings on the github repo is different from what you used for training.
self.N_word needs to be modified, because during the embedding process, the model requires the total number of "word dictionary" constructed based on training dataset and your dataset. For more details, please refer to preprocessing_data.py.
At 2024-04-10 10:53:57, "mmagithub" @.***> wrote:
Thanks for you answers. I have tried the model you shared and I needed to change this line in the model.py script: from: self.pro_embed = nn.Embedding(self.N_word, self.embed_dim) To: self.pro_embed = nn.Embedding(num_embeddings=8313, embedding_dim=128)
Possibly the settings on the github repo is different from what you used for training.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hello, Nice repo, I am wondering if you can share pre-trained models that were trained on the full bindingdb that we can use directly for prediction. I am also wondering if you have done any curation/exclusion of the bindingDB records. The original number of records in the BindingDB (2.8M records) is significantly larger than the number of records you showed in the manuscript (~62k).
https://www.bindingdb.org/rwd/bind/index.jsp
Looking forward for your reply,
Thanks, Marawan