molML / MoleculeACE

A tool for evaluating the predictive performance on activity cliff compounds of machine learning models
MIT License
151 stars 19 forks source link

Question regarding the DATASET #4

Closed smiles724 closed 1 year ago

smiles724 commented 1 year ago

Hi, it is very helpful that you provided relevant datasets.

However, there is one thing I am concerned about. Do your benchmark dataset has clear relations of cliff molecules? In other words, can we know exactly which pair of molecules have close graph structures but significantly different properties?

Thanks,

derekvantilborg commented 1 year ago

Hi, we do not analyze individual activity cliff pairs in the paper because we were especially interested in dataset-level effects of the presence of activity cliffs. We do calculate them, however, to find all molecules that form activity cliff pairs, as we discuss in the paper. You can calculate and backtrack all individual activity cliff pairs using the functions found in MoleculeACE/benchmark/cliffs.py. We simply represent all pairs in a square matrix.

smiles724 commented 1 year ago

Thanks, I noticed that in the code!