Closed Boxylmer closed 1 year ago
As you might expect, the indices follow the order of characters appear in the SMILES string. This is the correct behavior of the SMILES parser.
This is incredibly helpful. Thank you for confirming, and thanks for maintaining this incredible project!
Our group is trying to augment SMILES to work with polymers in MolecularGraph.
To do this, we're considering adding an '&' followed by some context information to atoms that are part of the repeating unit connections, which we would handle separately after doing what we need with MolecularGraph.jl. This context info and '&' would, of course, be removed prior to being fed into the
smilestomol
function (as that would break it), but we'd like to keep track of the index of the resulting atom in theGraphMol
object which the & was next to. I can't seem to find a good way to do this.*Right now it looks like the indices follow the order of atoms presented in the smiles, so until I find this to not be the case, I'll assume its true.
Example pre-treated input "&C(CC)C&CC" -> We have a repeating connections at(*) index 1 and index 5. -> snip out this context for use in
smilestomol
-> "C(CC)CCC" -> GraphMolAny ideas on how I could guarantee I know what indices these atoms would have?