mojaie / MolecularGraph.jl

Graph-based molecule modeling toolkit for cheminformatics
MIT License
189 stars 27 forks source link

add support for V3000 format and rxn files #62

Closed hhaensel closed 2 years ago

hhaensel commented 2 years ago

I've added preliminary support for extended mol-file format (V3000) and reading of reaction files into a tuple of GraphMol Vectors (reactants and products) (see my issues #60, #61)

One could think of adding a reaction type later, but for my use case the current changes are already very helpful. I've also not yet added radical information.

One technical remark. I've modified sdftomol() to automatically detect the extended format and to read reaction files. As a consequence parse(SDFile, ...) results in either a GraphMol object or in a tuple of vectors of Graphmol. From a type stabilty view point this is not so nice. One could also think about splitting the two parsers. But I thought, it's nice to have just the one interface.

Happy to hear what you think.

hhaensel commented 2 years ago

didn't like my approach of having Base.parse(SDFile, ...) return a Union type so I got around that by introducing a reaction type and splitting the parsing routines. I also corrected minor errors that I introduced from mal-formed V3000 example files.

The only disadvantage is that now the nohaltsupplier only serves SDFiles ... One might think about renaming SDFileReader to a general FileReader and used it for both file types.

hhaensel commented 2 years ago

I think he PR is in a good state now. I organised RDFileReader to have an IO field rather than a Base.EachLine, because RD files have no true delimiter. Instead each new $RXN indicates the start of a new reaction. But as he version information of the reaction is stored in that line, reading the full line would skip this information. Furthermore, I think the readuntil() is even more efficient than recombining the block with push!(). But that's probably neglibible over all other tasks that are performed during a graph rendering ...

I hope you find this PR helpful.

codecov-commenter commented 2 years ago

Codecov Report

Merging #62 (01362cd) into master (e5196c1) will decrease coverage by 1.28%. The diff coverage is 2.40%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #62      +/-   ##
==========================================
- Coverage   77.48%   76.20%   -1.29%     
==========================================
  Files          56       56              
  Lines        4860     4942      +82     
==========================================
  Hits         3766     3766              
- Misses       1094     1176      +82     
Impacted Files Coverage Δ
src/model/molgraph.jl 78.12% <0.00%> (-1.67%) :arrow_down:
src/stereo.jl 63.00% <0.00%> (-1.91%) :arrow_down:
src/sdfilereader.jl 50.96% <2.73%> (-26.25%) :arrow_down:
src/draw/draw3d.jl 95.65% <0.00%> (-2.18%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e5196c1...01362cd. Read the comment docs.

mojaie commented 2 years ago

Thank you very much! That looks great. I actually didn't know about readuntil, but it looks like a better solution. Anyway, I'll merge it soon.