mojaie / MolecularGraph.jl

Graph-based molecule modeling toolkit for cheminformatics
MIT License
187 stars 27 forks source link

add format converter #96

Open hhaensel opened 1 year ago

hhaensel commented 1 year ago

Currently, only smiles and sdf format are supported for molecule generation. For export, only v2 sdf is in place.

I submitted a #95 to generate sdf files and molecules from inchi strings.

Furthermore it would be fantastic to include OpenBabel, perhaps via a new package OpenBabel.jl and package extensions? @mojaie Tell me what you think.

Boxylmer commented 1 year ago

This would be an incredible addition to the library. consider this a +1 for a separate package + compatibility to it in MolecularGraph. Since OpenBabel is written in C++ and (I think) has an associated DLL/SO, could this be hosted as an artifact for the potential OpenBabel.jl? Or are you intending on making installing the program a prerequisite?

timholy commented 1 year ago

It's already available with https://github.com/JuliaBinaryWrappers/openbabel_jll.jl. That's a low-level wrapper, someone might want to create OpenBabel.jl to give it a more "Julian" interface. But having the JLL already built is a huge head start.

hhaensel commented 1 year ago

Thanks for the hint. On the first glance, I can only handle the executables that come with the openbabel_jll. With openbabel_jll installed I can do

using openbabel_jll

ENV["BABEL_LIBDIR"] = readdir(joinpath(dirname(dirname(openbabel_jll.obabel_path)), "lib", "openbabel"), join = true)[1]
babel_cmd = openbabel_jll.obabel();

rstrip(read(`$babel_cmd "-:CCO" -ismi -omol --gen2D`, String))

However the time for the conversion command turns out to be the double of what I achieved by simply extracting the files from the binary OpenBabel distribution

babel_cmd = raw"C:\Temp\Openbabel\obabel"
rstrip(read(`$babel_cmd "-:CCO" -ismi -omol --gen2D`, String))

where I simply extract the files from the distributed OpenBabel binary. That's a bit strange.

Unfortunately, I don't have a clue, how to access libopenbabel directly. Would I need to use Cxx.jl or CxxWrap.jl as openbabel is written in C++? Up to now I only did some ccalls into well documented C libraries. Any hint is welcome.

timholy commented 1 year ago

I've not written a high-level wrapper for a JLL before, so you should read docs or ask others. But my understanding is that you won't need to use backticks anywhere, that everything will be a direct call. This highlighted line suggests some of the things you should be able to do: https://github.com/JuliaBinaryWrappers/openbabel_jll.jl/blob/ad1a8e44175f44bd378afdb43f1896ff05161bda/src/wrappers/x86_64-linux-gnu-cxx11.jl#L2

hhaensel commented 1 year ago

Unfortunately, all of these functions are indeed executable in julia but they do nothing else than returning a setenv with the binary executables together with an environment. I guess one has to use a higher-level library that is written by the help of Swig or rewrite small pieces of code with Cxx.jl.

timholy commented 1 year ago

I'd strongly recommend checking in with the #binarybuilder channel on Slack before going to any effort.

hhaensel commented 1 year ago

Meanwhile I think the most effective solution is to use a Python wrapper, e.g.

using PyCall

ob = pyimport("openbabel.openbabel")

function babel(input, informat = "smi", outformat = "mol")
    conv = ob.OBConversion()
    conv.SetInAndOutFormats("smi", "mol")

    mol = ob.OBMol()
    conv.ReadString(mol, input)

    if ! mol.Has2D() && ! mol.Has3D()
        pgen = ob.OBOp.FindType("gen2D")
        pgen !== nothing && pgen.Do(mol)
    end

    conv.WriteString(mol)
end
julia> @time babel("CCO") |> println

 OpenBabel06162301242D

  3  2  0  0  0  0  0  0  0  0999 V2000
    1.7321   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8660    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0000   -0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END

  0.005649 seconds (62 allocations: 2.906 KiB)
mojaie commented 1 year ago

Thank you for the great suggestions. It would be nice to create new library OpenBabel.jl with Julian interface. It would be easy to keep compatibility with MolecularGraph.jl. FYI, coordgenlibs_jll used in MolecularGraph.jl is a fork of coordgenlibs written in C++ which has additional C interfaces enables its use in Julia's ccall.

hhaensel commented 1 year ago

I'll have a look that sounds interesting. Did you have a look at the PR #95? That should work right away, although full functionality would only come with an upgrade of libinchi. What do you think about the

mojaie commented 1 year ago

Yes, it looks great. It will be merged soon. Also, thank you for fixing the Windows issue. I was wondering why CI has been failing on only Windows!

timoleistner commented 7 months ago

There is also the RDKitMinimalLib wrapper which has a different mol object of course. A function which translates these mol objects into MolecularGraph mol objects would solve https://github.com/mojaie/MolecularGraph.jl/issues/72 and https://github.com/mojaie/MolecularGraph.jl/issues/67 and would be an awesome addition to this package.

Edit: The simplest solution would be to use something like smiles as a common language. E.g. use get_mol() and get_smiles() functions from RDKitMinimalLib to obtain a SMILES from e.g. a molblock and then use smilestomol() from MolecularGraph to obtain the mol object.

smilestomol(get_smiles(get_mol(molblock)))

But this would lose spatial information from the molblock...

Edit2: Instead of SMILES one should at least use InChI as SMILES is not an open standard and different SMILES specifications exist.

timoleistner commented 4 months ago

Like mentioned here https://github.com/mojaie/MolecularGraph.jl/issues/23#issuecomment-1926960506 openbabel_jll provides an ExecutableProduct. It also providest two LibraryProducts libinchiand libopenbabel. One can see all products available using ? openbabel_jll which returns the following products:

  Products
  ========

  The code bindings within this package are autogenerated from the following Products:

    •  LibraryProduct: libinchi

    •  LibraryProduct: libopenbabel

    •  ExecutableProduct: obabel

    •  ExecutableProduct: obconformer

    •  ExecutableProduct: obdistgen

    •  ExecutableProduct: obenergy

    •  ExecutableProduct: obfit

    •  ExecutableProduct: obfitall

    •  ExecutableProduct: obgen

    •  ExecutableProduct: obgrep

    •  ExecutableProduct: obminimize

    •  ExecutableProduct: obmm

    •  ExecutableProduct: obprobe

    •  ExecutableProduct: obprop

    •  ExecutableProduct: obrms

    •  ExecutableProduct: obrotamer

    •  ExecutableProduct: obrotate

    •  ExecutableProduct: obspectrophore

    •  ExecutableProduct: obsym

    •  ExecutableProduct: obtautomer

    •  ExecutableProduct: obthermo

    •  ExecutableProduct: roundtrip

Therefore we would be able to call functions using ccall() from libopenbabel, if we would know the functionnames... Using readelf one can inspect shared library files and look for functions. readelf -sW ~/.julia/artifacts/f1eb34813a945111198356e33b5f2034cc7990ab/lib/libopenbabel.so This is still pretty messy and I didn't find a proper documentation for this library.

Edit: OpenBabel API Docs

hhaensel commented 3 weeks ago

For multithreaded apps the above solution crashed. I then came up with

using PythonCall
using ThreadPools

macro pythread(expr)
    quote
        fetch(@tspawnat 1 begin
            $(esc(expr))
        end)
    end
end

function babel_convert(input, informat = "cdxml", outformat = "mol")
    @pythread begin
        ob = pyimport("openbabel.openbabel")
        conv = ob.OBConversion()
        conv.SetInAndOutFormats(informat, outformat)

        mol = ob.OBMol()
        conv.ReadString(mol, input)

        if ! Bool(mol.Has2D()) && ! Bool(mol.Has3D())
            pgen = ob.OBOp.FindType("gen2D")
            pgen !== nothing && pgen.Do(mol)
        end

        pyconvert(String, conv.WriteString(mol))
    end
end

I also placed this on discourse and on https://github.com/JuliaPy/PythonCall.jl/issues/201