pckroon / pysmiles

A lightweight python-only library for reading and writing SMILES strings
Apache License 2.0
147 stars 21 forks source link

Writing smiles silently breaks on graphs with multiple fragments #26

Closed TimoSommer closed 1 year ago

TimoSommer commented 1 year ago

When writing smiles from a self-defined graph, it can happen that this graph has multiple unconnected fragments. I would have expected this to be recognized as zero-order bonds automatically. However, if encountering such a graph with multiple fragments, write_smiles() will only return the smiles string for one of the fragments and not even throw an error or warning. It assumes that zero-order bonds are explicitly mentioned as bonds with order zero in the graph, but this is an assumption that will probably fail in most use cases (like mine).

I propose to fix this issue by introducing zero-order bonds between unconnected fragments in the graph:

fragments_connectors = [list(frag)[0] for frag in nx.connected_components(graph)]
central_node = fragments_connectors.pop(0)
for idx in fragments_connectors:
    graph.add_edge(central_node, idx, order=0)

I actually don't know how to contribute to this in the best way by editing the code, but I'd be willing to give it a try if you could tell me how.

pckroon commented 1 year ago

Thanks for bringing this to my attention. This is a use-case I never considered :) A somewhat elegant solution might be to take write_smiles (https://github.com/pckroon/pysmiles/blob/master/pysmiles/write_smiles.py#L77), and rename that to write_smiles_molecule (or something). Then make a new write_smiles along the following lines:

def write_smiles(molecule, default_element='*', start=None):
    smiles = []
    for nodes in nx.connected_components(molecule):
        if start and start in nodes:
            start_ = start
        else:
            start_ = None
        smiles.append(write_smiles_molecule(molecule.subgraph(nodes, default_element=default_element, start=start_))
    return '.'.join(smiles)

And then we'd also need a test for this of course. Maybe we should also log a message (info? warning?) if there is more than 1 connected component. I'd be extremely happy if you could open a pull request along these lines --- I really don't have the time :(

TimoSommer commented 1 year ago

I'm quite busy at the moment as well, but I'm gonna keep it in mind for when I might have more time!