molML / MoleculeACE

A tool for evaluating the predictive performance on activity cliff compounds of machine learning models
MIT License
151 stars 19 forks source link

Levenshtein similarity #10

Closed githubXin123 closed 1 year ago

githubXin123 commented 1 year ago

https://github.com/molML/MoleculeACE/blob/024ef21d4e6e266037779cc1b133f4b210fb0464/MoleculeACE/benchmark/cliffs.py#L120 The code used to calculate the Levenshtein similarity appears to be problematic, it should be: m[i, j] = 1- (levenshtein(smiles[i], smiles[j]) / max(len(smiles[i]), len(smiles[j])))

derekvantilborg commented 1 year ago

On line 127 we change the distance to a similarity: m = 1 - m