rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
197 stars 55 forks source link

Support for CXSMILES #49

Closed Jha-Prajjwal closed 1 year ago

Jha-Prajjwal commented 1 year ago

Currently mmpdb seems to only support SMILES. But rdkit can natively support CXSMILES. Is it possible to extend mmpdb to support CXSMILES?

For our work, we were initially using CXSMILES and were using the comma delimiter. Using a csvwriter we were enclosing the CXSMILES in double quotes so that the csvreader would know not to split the commas inside the double qoutes. But as it turns turns out mmpdb just uses python's split() method, which does not take care of ignoring the commas inside qoutes. So, this doesn't work for CXSMILES.

adalke commented 1 year ago

Likely not. The input structures are re-canonicalized to SMILES before use, and the canonical fragmentation and re-assembly code depends very heavily on the SMILES syntax.

Thus, basically all of the extended properties from CXSMILES cannot be preserved.

Why specifically do you want CXSMILES as input?

cdvonbargen commented 1 year ago

@adalke -- I think the request is simply to make sure the csv parsing doesn't choke on CXSMILES rather than do anything interesting with the extension. In general, we're hoping to make sure that enhanced stereo features are preserved, but that feels like a separate request from csv failures.

adalke commented 1 year ago

What about using tab as the delimiter?

Jha-Prajjwal commented 1 year ago

@adalke yup, the tab delimiter would work for our use case. Thanks!