Closed a19s closed 1 year ago
@a19s I checked the dataset and, apart from the few data points you mentioned, there are also some other data points with the same issue: they have identical 'exp_mean' values but different 'y' values.
@a19s I checked the dataset and, apart from the few data points you mentioned, there are also some other data points with the same issue: they have identical 'exp_mean' values but different 'y' values.
Yup @githubXin123, this was just an example that was easy to reproduce, but there are more values with the same issue.
@derekvantilborg any idea of why this is the case?
Hi, thanks for pointing this out. It gave me a proper scare. This is a post-mortem of what happend with the data:
Luckily, for all model training, evaluation, etc I just use the -log10 values from the 'y' column. This means that the results of the study should stay the same.
I will update the csvs with their correctly transformed 'exp_mean' values and fix this bug in the code
Thank you very much @derekvantilborg for following this up - I am also very happy to hear about your findings :) Keep up the great work!
@derekvantilborg Thank you for your response. I would also appreciate if you could carefully double-check the SMILES strings corresponding to these data.
@githubXin123 I'm on it
@derekvantilborg Hi, Tilborg, the raw data seems have not been fixed: https://github.com/molML/MoleculeACE/blob/main/MoleculeACE/Data/benchmark_data/raw/
Hi all. I'm aware that the data is currently not fixed yet. I'm working on a revision with the corrected code, data, and results. Recomputing the results takes a while, so I expect to update the repo somewhere next week.
Thank you all for being so patient. I released a new version (V3) of the benchmark with corrected code, data, and results. We also submitted a correction to the paper. Luckily the findings from the corrected results match the findings in the original paper. I'm very sorry for the inconvenience this bug may have caused some of you.
cheers, Derek
Thank you very much Derek!
On 29 Sep 2023, at 08:14, Derek van Tilborg @.***> wrote:
Thank you all for being so patient. I released a new version (V3) of the benchmark with corrected code, data, and results. We also submitted a correction to the paper. Luckily the findings from the corrected results match the findings in the original paper. I'm very sorry for the inconvenience this bug may have caused some of you.
cheers, Derek
— Reply to this email directly, view it on GitHub https://github.com/molML/MoleculeACE/issues/12#issuecomment-1740415108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3J64DGJMKOEYHTMINXDI3X4ZYO7ANCNFSM6AAAAAA2EYOYLU. You are receiving this because you were mentioned.
Hey, I really appreciate your work - thank you very much for sharing the code and the data.
I found an inconsistency that I couldn't wrap my head around, and would like to ask you to clarify directly:
When looking at the data here: https://github.com/molML/MoleculeACE/blob/main/MoleculeACE/Data/benchmark_data/CHEMBL2147_Ki.csv
the file has a column called "exp_mean [nM]", and a "y" column which should be the -log10(exp_mean), according to visual inspection and to what you wrote in the paper: "The mean Ki or EC50 value for each molecule was computed and subsequently converted into pEC50/pKi values (as the negative logarithm of molar concentrations)"
However, there is an issue: Smiles with the same value of "exp_mean" (e.g. of 100 nM) have "y" values that are either positive or negative (e.g. 2 or -2 in the example below), and I haven't found any way to make sense of this!
Could you please clarify what is the origin of this inconsistency?
Thank you!