pckroon / pysmiles

A lightweight python-only library for reading and writing SMILES strings
Apache License 2.0
149 stars 21 forks source link

Question on rings #30

Closed fgrunewald closed 1 year ago

fgrunewald commented 1 year ago

The bug / question Most likely this question is due to my lack of understanding the smile syntax, but I encountered an odd behavior on rings. The following smile string [CH3](c1ccccc1)[CH2] correctly generates a graph of ethylbenzene, whereas this smile string [CH3]c1ccccc1[CH2] generates a graph of dimethylbenzene but one of the methyl groups lacks a hydrogen. My understanding is that the two smiles are the same but the second one is more sloppy as it lacks the braces. Should this perhaps raise an error?

Code to reproduce this behavior

import sys
import matplotlib.pyplot as plt
import networkx as nx
import pysmiles

mol = pysmiles.read_smiles(sys.argv[1], explicit_hydrogen=True)

nx.draw(mol, labels=labeldict, with_labels=True,  pos=nx.kamada_kawai_layout(mol) )
plt.show()

I tested with networkx version 2.8.1 and 3.1. The behavior is the same.

pckroon commented 1 year ago

These SMILES do not describe the same molecule. For the first, the brackets indicate a starting branch, which means that the CH2 is bound to the CH3. Do note however that this is not ethylbenzene! Instead it makes CH2-CH3-Ph... The second smiles indeed makes orthodimethylbenzene (minus 1 hydrogen)