rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.66k stars 881 forks source link

aromatic bonds in partial rings have inconsistent bond depictions depending on presence of labels on atoms #6582

Closed oleksii-dukhno-bayer closed 1 week ago

oleksii-dukhno-bayer commented 1 year ago

Describe the bug When drawing query molecules constructed from SMARTS that have partial, unclosed aromatic rings, some aromatic bonds are drawn as a dashed line overlaying a solid line instead of a dashed line side-by-side with a solid line. The behavior does not happen with full rings. It seems that this behavior also depends on presence of labels on atoms, as turning them off fully fixes the issue.

To Reproduce In a Jupyter notebook:

import rdkit
from rdkit import Chem
smarts_list = [
    '[1#0]-[*](:[*]:[*]):[*](-[#8]):[*]',
    '[1#0]-[c](:[c]:[c]):[c](-[#8]):[*]',
    '[1#0]-[c](:[n]:[c]):[c](-[#8]):[*]',
    '[1#0]-[c](:[n]:[n]):[c](-[#8]):[*]',
    '[1#0]-[n](:[n]:[n]):[n](-[#8]):[*]',
    '[1#0]-c(:c:c):c(-[#8]):[*]',
    '[1#0]-*(:*:*):*(-[#8]):[*]',
]
for smarts in smarts_list:
    display(Chem.Draw.MolToImage(Chem.MolFromSmarts(smarts)))

Expected behavior All aromatic bonds are drawn as a dashed line side-by-side with a solid line.

Screenshots Executing above code results in these depictions: image

Things tried that don't affect bond depictions:

Things tried that affect the bond depictions:

One of the offenders at higher resolution (line overlay clearly visible): image image

Configuration (please complete the following information):

Thanks for keeping RDKit alive!

DavidACosgrove commented 1 year ago

Hi @oleksii-dukhno-bayer,

Many thanks for the detailed report and admirable persistence in trying to solve the problem. There are two separate issues here. The overlay of the two lines of the bond is very likely a different manifestation of a bug fixed by PR6571. I will check that tomorrow. The other one is harder. The dashed lines are all inside the ring for a full ring because the drawing code sees that it's a ring and makes it so. It does the same with double bonds in the kekule representation as well. When it doesn't have a ring to hang its hat on, it falls back on the rules for drawing non-ring double bonds. Those attempt to have it draw the dashed bonds inside the angles of the bonds coming off it. On the whole that works ok, but you can see in your examples that when there isn't a single inside option, it is pretty arbitrary which way it is drawn. I don't think there's any easy fix for that that doesn't involve looking outside the immediate environment of the bond itself which will be very complicated for an unusual case. I will have a look and see if there's anything obvious that can be done, but I think the answer might be that you have to live with it.

If anyone can come up with a convenient algorithm for making it look nicer, I'll be happy to have a crack at implementing it, because it is a bit unsightly.

Dave

oleksii-dukhno-bayer commented 1 year ago

Thanks for the response 😊 The issue indeed looks similar to what 6571 fixes, if that's the case I'll wait for the next release and live with it for now.

For the bond depictions flipping side to side (thankfully not a big deal for my use case): the only thing that comes to my mind would be something along these lines (as a postprocessing pass after atom coordinates have been decided):

  1. get all aromatic bonds and store them in a list.
  2. for ring aromatic bonds, decide bond direction with the usual approach. remove them from the list.
  3. from the list, find uninterrupted connected systems of non-ring aromatic bonds, for the largest system:
    • find the furthest atoms in the chain (maximum topological distance), this gives the longest chain within the system
    • find the average coordinates ("center of mass") of each k=4 consecutive atoms in this chain (at the very chain edges, each 3 or each 2), essentially a rolling coordinate average; these serve as preliminary points of reference;
    • for each bond, find the average center of mass of preliminary points to which it's atoms participated (e.g. for bond 1 those would be average of reference points 1,2,3 , for bond 2 those would be average of reference points 2,3,4, etc). these serve as true points of reference;
    • for each bond, find the normal direction (sign of cosine? not sure) of the angle between start-end bond vector and bond center - true point of reference. this tells in which direction the bond should have its minor component drawn (dotted line or solid for kekulized representations). for tiebreakers (when the final reference point sits exactly on the bond), choose randomly - it can happen only in the cases where there's indeed no clear answer
    • remove the bonds from the list of bonds-to-decide.
  4. from the remaining list of bonds-to-decide, recalculate the uninterrupted connected systems of non-ring aromatic bonds.
  5. repeat steps 3-4 until decided for all bonds Here's an example of a pass of step 3 for one of the sample cases above: image I bet there's some weird edge cases that could happen (e.g. parts of macrocycles, multi-ring systems like pyrene with a few atoms sliced off, etc) but might be worth a crack. The value of k might need to be adjusted too. A potentially cleaner but more expensive solution would involve comparing the total amount of changes in "sidedness" of consecutive conjugated bonds, flipping the bonds appropriately, and then doing a pass of mild 2D geometry relaxation in case there's some atom overlap. Not sure if this would conflict with the existing drawing code, however.

Hope this helps!

DavidACosgrove commented 1 year ago

Just to confirm, this is how the master branch renders these molecules: image

DavidACosgrove commented 1 year ago

@oleksii-dukhno-bayer Thanks for the algorithm suggestion. I will have a think about it, but I suspect the answer might be that it's a lot of work for a low-incidence situation.

github-actions[bot] commented 3 weeks ago

This issue was marked as stale because it has been open for 90 days with no activity.

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 14 days since being marked as stale.