Open Yoshanuikabundi opened 10 months ago
If we choose to do anything on our end, I'm kinda in favor of 3. But since this has taken 5 years to emerge I think the best course would be to raise this on the RDKit issue tracker and see if @greglandrum invites us to submit a PR to fix upstream.
I'd be happy to see a PR or bug report with examples to raise an error on stuff like this on the RDKit side.
Given that the parsing code is working purely with 8 bit chars and ASCII, there's no other sensible alternative
Is your feature request related to a problem? Please describe. A zero-width space (
\u200b
) in a SMILES string causes RDKit to truncate the molecule without an error:I ran into this very confusing behavior while copying SMILES strings from specs.net.
Describe the solution you'd like
Ideally, RDKit would raise a parse error rather than just truncate, but we could solve this issue on our end instead.