mojaie / MolecularGraph.jl

Graph-based molecule modeling toolkit for cheminformatics
MIT License
189 stars 27 forks source link

Support for parsing smiles with per-atom labels e.g. atom mapping #87

Open doncamilom opened 1 year ago

doncamilom commented 1 year ago

Hi,

Thank you so much for the library! I'm trying to find a way of parsing smiles with atom mapping (of the format [CH3:1][CH:2]=[O:3]) but I don't see that's implemented here yet. I could work on this, but would also appreciate some inputs/ideas

Boxylmer commented 1 year ago

Is this kind of mapping typical for reactions? I don't know if MolecularGraph currently handles that. The atom indices in this library currently appear to be closely tied to how neighbors and edges link to one another. A quick way around this would be to define a separate mapping from MolecularGraph index -> user specified index, that could otherwise be nothing if the user doesn't specify anything. What use would this feature be if not for reactions?

doncamilom commented 1 year ago

Hi, thanks for your response! Yeah reactions is exactly the application I have in mind, so what I would like is to be able to read the atom tags directly as the smiles is parsed. The current parser knows how to handle things like [CH3] but is completely lost once it sees a semicolon, as in [CH3:1], so I would just have to add that extra thing, then also extending the SmilesAtom struct to contain an extra property for this number. Wdyt?

Boxylmer commented 1 year ago

In our case, we just needed a tiny bit of extra functionality that was not part of the DayLight standards (which this library closely follows), so we just built a parser on top of moleculargraph.jl that spits out a working SMILES. In your case, SMIRKS is very much standardized by DayLight. While I'd love to see functionality extended to MolecularGraph, just keep in mind that if you're doing something relatively quick, it may be in your best interest just to build "on top" of the library in your own project instead.

So looking at the SmilesAtom struct

https://github.com/mojaie/MolecularGraph.jl/blob/023d5922908d875ad019ac5cc420a42128e372b0/src/model/atom.jl#L107-L118

You'd want to add an extra label like identifier::Union{Int, Nothing}, and then reference it between two GraphMol objects to see where atoms went, as per SMIRKS standards?

doncamilom commented 1 year ago

Yes that's the plan, just adding an extra identifier label to the SmilesAtom struct. However for that I would also need to modify the SMILES parser in this library to directly read the atom map numbers, which I've found trickier to do without modifying the source code.

Also, building on top of SmilesAtom is quite problematic as:

  1. SmilesAtom is a concrete type so I can't create subtypes from it.
  2. I also can't directly modify the struct.
mojaie commented 1 year ago

I'm very sorry for the late response. I'm still not familiar with reaction and SMIRKS specification, but it may be not so difficult to be implemented. Now I'm working on new_graphs branch for many structural changes including generalization of SMILES/SMARTS implementation and parameterized atom/bond property types. Please wait for a while until the next version. (e.g. new smilestomol(::Type{T}, smiles::AbstractString) may be able to deal with any atom/bond parameter types like smilestomol(MolGraph{Int,Dict{Symbol,Any},Dict{Symbol,Any}}, "CCO"))

FYI: https://github.com/mojaie/MolecularGraph.jl/issues/75