Closed Jha-Prajjwal closed 2 years ago
Likely not. The input structures are re-canonicalized to SMILES before use, and the canonical fragmentation and re-assembly code depends very heavily on the SMILES syntax.
Thus, basically all of the extended properties from CXSMILES cannot be preserved.
Why specifically do you want CXSMILES as input?
@adalke -- I think the request is simply to make sure the csv parsing doesn't choke on CXSMILES rather than do anything interesting with the extension. In general, we're hoping to make sure that enhanced stereo features are preserved, but that feels like a separate request from csv failures.
What about using tab as the delimiter?
@adalke yup, the tab delimiter would work for our use case. Thanks!
Currently mmpdb seems to only support SMILES. But rdkit can natively support CXSMILES. Is it possible to extend mmpdb to support CXSMILES?
For our work, we were initially using CXSMILES and were using the
comma
delimiter. Using a csvwriter we were enclosing the CXSMILES in double quotes so that the csvreader would know not to split the commas inside the double qoutes. But as it turns turns out mmpdb just uses python'ssplit()
method, which does not take care of ignoring the commas inside qoutes. So, this doesn't work for CXSMILES.