Closed Jonas-Verhellen closed 2 days ago
Dear Marinka and Maintainers of the TDC Project,
I hope this email finds you well. I am reaching out to you regarding some issues I have encountered while attempting to reproduce the results obtained in the DRD3 docking group benchmark. As I hope to utilize your benchmark as the conclusion of an upcoming paper introducing a novel and significantly more effective generative model, I am keen to resolve these issues.
More specifically, if I look at the best performing model in the benchmark (GB-GA), I am having trouble locating the files for the current performance in the benchmark. The GitHub repository linked to the benchmark appears to be missing the majority of these files. Is it possible to obtain or publicly release all the molecules in SMILES format along with their corresponding docking scores as they were submitted to the benchmark?
In addition, I have encountered discrepancies in the docking values for several individual molecules when compared to the values reported. Some examples: For instance these SMILES, from the smiles_lstm_2_5000.txt file, have markedly different reported docking scores than the ones I currently obtain from the oracle (installed according to the instructions on the TDC website):
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCCc1ccc(C(F)(F)F)cc1: -15 vs -9.2 O=C(CCOc1ccccc1)Oc1ccccc1C(=O)CCCc1ccccc1F: -15 vs -9.2 O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1C(F)(F)F: -15.0 vs -10.3 O=C(Nc1ccccc1F)Oc1ccccc1C(=O)CCc1ccccc1C(F)(F)F: -14.6 vs -9.0 O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1Cl: -14.5 vs -9.1 O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1F: -14.4 vs -8.9 O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOCc1ccccc1: -14.4 vs -9.0
I am uncertain whether these discrepancies stem from specific settings, something simple I've missed, or a change in the backend. Would it be possible to please provide any clarification on this matter?
Thank you in advance.
@Jonas-Verhellen will have a look
@kexinhuang12345
@futianfan @wenhao-gao are you able to help with this?
Hi @Jonas-Verhellen , what version of scikit are you using? What version of TDC? I'm fairly sure the cause is the same as this issue. Checking how to resolve.
Hi @amva13, thanks for looking into this! I am using scikit-learn 1.3.0 and pytdc 0.4.1 with python 3.10.12. Let me know if you need any more information.
Hi @amva13,
I'm checking in. How are things on this front? Any more clarity?
Kind regards, Jonas
Hi @Jonas-Verhellen , I expect to be able to dive into this after June 21st. At the moment, there are conferences in the way. Sorry for the inconvenience!
Hi @amva13,
No problem at all. Thanks for looking into this. Have a good time at the conferences!
Kind regards, Jonas
making note here. this and many similar issues probably due to
1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | 1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk | 1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk | | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | | | | | View all alerts | View all alerts | View all alerts | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk | 1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk | 1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | | | | | View all alerts | View all alerts | View all alerts | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | | | | | View all alerts | View all alerts | View all alerts | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | | | | | View all alerts | View all alerts | View all alerts | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | | | | | View all alerts | View all alerts | View all alerts | ntlk unsafe deserialization vulnerability High severity nltkCVE-2024-39705 View all alerts | | | | | View all alerts | View all alerts | View all alerts | View all alerts | View all alerts | View all alerts View all alerts | View all alerts View all alerts mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt | mims-harvard/TDCexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txtexamples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt1 repository in your mims-harvard organization might be affected by a security vulnerability in nltk ntlk unsafe deserialization vulnerability High severity nltk CVE-2024-39705 mims-harvard/TDC [examples/generation/docking_generation/guacamol_tdc/guacamol_baselines/dockers/requirements.txt](https://github.com/mims-harvard/TDC/security/dependabot/799) [examples/generation/docking_generation/guacamol_tdc/guacamol_baselines/requirements.txt](https://github.com/mims-harvard/TDC/security/dependabot/800)
https://github.com/mims-harvard/TDC/issues/291 <-- solution being worked on in this ticket. follow this one.
We have updated the oracles for jsk3, gssk3b, and drd2 and reproduced results perfectly.
Unfortunately, when it comes to the oracles in this ticket, there is software dependency on Coley Group's software, and we cannot guarantee cross compatibility at all times. We will in the future look to update our documentation to reflect pure TDC vs community oracles to reflect this. Given inherent stochasticity in the models, changes can be expected for slight versioning changes.
Our new package has been updated to 1.0.0 to indicate the lack of guarantee for identical backwards compatibility. At the end of the day, despite the numbers being different, we still believe they're reliable, but full confidence on whether there are serious dependency concerns can only be addressed by Coley group @wenhao-gao
example but there are many https://github.com/coleygroup/pyscreener (which is not installed in tdc by default)
reproducibility for TDC-maintained oracles is proved as of this update https://github.com/mims-harvard/TDC/pull/293 in pytdc version 1.0.0
we will be closing this and the associated tickets accordingly.
In addition, please note the benchmark results in the user group meetup are for a particular example and not for any given model. If you believe there's serious issues there, please provide the code and training you're using to evaluate. If you're just running the same oracle, please refer to thee above.
see https://github.com/mims-harvard/TDC/issues/245 for full description
Dear maintainers of the TDC project,
I'm trying to reproduce the results obtained in the DRD3 docking group benchmark for the GB-GA model. I am however having a few issues.
Unfortunately, I can also not locate all the pickle files for the currently claimed performance in the benchmark. The github repo linked to the benchmark is missing the majority of these files. I have noticed the website does have a visualization of the molecules. Is it possible to find (or publicly release) all the molecules in a SMILES format with their docking scores as submitted they were to the benchmark?
It is not entirely clear to me which dateset is used to seed the algorithms. Is it Zinc 250k or guacamol_v1_all.smiles?
Kind regards, Jonas