Closed ghost closed 1 year ago
Thanks for checking the codes. The implementation in the paper are a bit different from the repo as we mentioned in:
Note that we previously use ./data/get_SMILES.py for getting SMILES strings from drugbank. However, due to the web structure change of drugbank, this crawler is not used in the current pipeline. Now, we are using drugbank_drugs_info.csv to obtain the SMILES string for each ATC3 code, thus, the data statistics differ a bit from the paper. The current statistics are shown below:
The main difference is in how we get the drug SMILES string (the paper crawl methods misses a lot of molecules, while the current drugbank method gives more comprehensive sets). I am sure if you tune the hyperparameters (especially the coefficient for DDI loss), the performance will become closer.
I will close this issue. Open it if you have further question @J-Zhangg
Hi, the exact reproducing codes can be found in archived
branch. Check this
@J-Zhangg
Thanks for your reply! I will reproduce it. :)
This is excellent work, and thanks a lot for your open-source code. However, when I reproduced your work, I found that I could not achieve the results reported in the article. In reproducing, I followed exactly the steps in the
readme.md
and used the same hardware. Besides, in the article, you report a learning rate of2e-4,
whereas the default learning rate in the code is5e-4
. In reproducing, I found that the result was better when the learning rate was set to5e-4
. I don't know what's wrong, and I hope I can get your help. Thank you very much.The following results are reproduced.
When the learning rate is set to
5e-4
:DDI: 0.0632 (0.0003) Ja: 0.5114 (0.0026) F1: 0.6676 (0.0023) PRAUC: 0.7649 (0.0028)
When the learning rate is set to
2e-4
:DDI: 0.0607 (0.0005) Ja: 0.5089 (0.0022) F1: 0.6659 (0.0019) PRAUC: 0.7632 (0.0022)
The results reported in the article:
DDI: 0.0589 (0.0005) Ja: 0.5213 (0.0030) F1: 0.6768 (0.0027) PRAUC: 0.7647 (0.0025)