pckroon / pysmiles

A lightweight python-only library for reading and writing SMILES strings
Apache License 2.0
147 stars 21 forks source link

thiophene + adding hydrogen leads to incorrect count #37

Closed fgrunewald closed 6 months ago

fgrunewald commented 6 months ago

Hi @pckroon,

When parsing the SMILES string describing thiophene (c1ccsc1) an incorrect hcount is assigned for the sulfur. For thiophene there is no hydrogen attached to the sulfur atom but the hcount is 1, consequently, one hydrogen is added. I've traced the accounting problem back to the fill_valence function in pysmiles.smiles_helper module. For this case the number of bonds attached to sulfur is 3 (i.e. 1.5 x 2), however, according to the listed valances, this would mean sulfur gets a valence of 4. And that is the problem because technically speaking I think the code works correctly.

I guess we need to check for the sulfur case if it is aromatic or not?

fgrunewald commented 6 months ago

this has been addressed in #38