rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.68k stars 880 forks source link

Discriminating 'symmetric' molecular morgan fingerprints #1682

Closed spadavec closed 6 years ago

spadavec commented 6 years ago
rdkit version: 2015.09.1
platform: Ubuntu 16.04

I'm a bit fresh to this and still trying to wrap my head around circular fingerprints (so my apologies for inevitable mistakes). It appears that 'symmetrical' molecules (using this term lightly) give the same fingerprint bitvectors. For example, both of the below molecules have the same (and only the same) indexes in their bitvectors 'on'.

marvin4js-outputpng 4 marvin4js-outputpng 3

Pictures might be a bit fuzzy, but the bond/atom paths are the same between both (but one is just 2x the size). Is there a way to discriminate between these two molecules using the morgan bitvector generator? Interested in using the bitvectors for neural network regression, and these molecules are predicted to be equally active/inactive, given the similarity.

Note: this is using depth 2 morgan fingerprints

proteneer commented 6 years ago

You’d need to increase the fingerprint radius (significantly) to capture the NCCSCCSCCN moeity.

On Sat, Dec 9, 2017 at 5:28 PM Vito Spadavecchio notifications@github.com wrote:

rdkit version: 2015.09.1 platform: Ubuntu 16.04

I'm a bit fresh to this and still trying to wrap my head around circular fingerprints (so my apologies for inevitable mistakes). It appears that 'symmetrical' molecules (using this term lightly) give the same fingerprint bitvectors. For example, both of the below molecules have the same (and only the same) indexes in their bitvectors 'on'.

[image: marvin4js-outputpng 4] https://user-images.githubusercontent.com/3453650/33800027-d7044ce8-dcec-11e7-90bf-b02f7dbf2d65.png [image: marvin4js-outputpng 3] https://user-images.githubusercontent.com/3453650/33800028-d8b7fe2c-dcec-11e7-9178-bf9291c90d97.png

Pictures might be a bit fuzzy, but the bond/atom paths are the same between both (but one is just 2x the size). Is there a way to discriminate between these two molecules using the morgan bitvector generator? Interested in using the bitvectors for neural network regression, and these molecules are predicted to be equally active/inactive, given the similarity.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/rdkit/issues/1682, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLNFECYW49bdhZfFZP5qWZTq7rcVxcwks5s-wmdgaJpZM4Q8Ru5 .

-- Yutong Zhao

spadavec commented 6 years ago

Was curious if there was a depth independent route, but appears not! Thanks for input.