Open eahenle opened 1 year ago
Thank you for the catch. Maybe SMARTS query is still not compatible with some advanced queries. I'm working on SMARTS in dev branch. Later I will check the current state.
MACCS fingerprinting scheme (we are trying to implement this fingerprinting for the package)
I'm very happy to hear that!
Here is the complete list of MACCS rules that are returning false-positive for the molecule shown above.
Each rule is a tuple that gives the SMARTS query and the count of matches that must be exceeded to turn the bit "on".
Tuple{String, Int64}[
("[#6]=[#6](~[!#6;!#1])~[!#6;!#1]", 0),
("[!#6;!#1]~[CH2]~[!#6;!#1]", 0),
("[!#6;!#1;!H0]~*~[!#6;!#1;!H0]", 0),
("[!#1;!#6;!#7;!#8;!#9;!#14;!#15;!#16;!#17;!#35;!#53]", 0),
("[#6]=[#6]~[#7]", 0),
("[!#6;!#1;!H0]~*~*~*~[!#6;!#1;!H0]", 0),
("[!#6;!#1;!H0]~*~*~[!#6;!#1;!H0]", 0),
("[!#6;!#1;!H0]~[!#6;!#1;!H0]", 0),
("[!#6;!#1]~[!#6;!#1;!H0]", 0),
("[!#6;!#1]~[#7]~[!#6;!#1]", 0),
("[#6]=[#6](~*)~*", 0),
("[#6]=[#7]", 0),
("*~[CH2]~[!#6;!#1;!H0]", 0),
("[C;H2,H3][!#6;!#1][C;H2,H3]", 0),
("[\$([!#6;!#1;!H0]~*~*~[CH2]~*),\$([!#6;!#1;!H0;R]1@[R]@[R]@[CH2;R]1),\$([!#6;!#1;!H0]~[R]1@[R]@[CH2;R]1)]", 0),
("[\$([!#6;!#1;!H0]~*~*~*~[CH2]~*),\$([!#6;!#1;!H0;R]1@[R]@[R]@[R]@[CH2;R]1),\$([!#6;!#1;!H0]~[R]1@[R]@[R]@[CH2;R]1),\$([!#6;!#1;!H0]~*~[R]1@[R]@[CH2;R]1)]", 0),
("[!#6;!#1]~[CH3]", 0),
("[!#6;!#1]~[#7]", 0),
("[#6]=[#6]", 0),
("[!#6;!#1;!H0]~*~[CH2]~*", 0),
("[#7]=*", 0),
("[!#6;!#1;!H0]", 1),
("*1~*~*~*~*~*~1", 1),
("[#6]-[#7]", 0)
]
@eahenle queries you listed seems to return false at the new version (I checked it with v0.14.2).
Given an input SMILES string and SMARTS query from the MACCS fingerprinting scheme (we are trying to implement this fingerprinting for the package) we found the following issue:
Looking at the substructure match in Pluto, we see this:
This shows, I think, two problems:
This is one example, but for this single structure, there are many MACCS keys that return false positive.