Open hhaensel opened 1 year ago
This would be an incredible addition to the library. consider this a +1 for a separate package + compatibility to it in MolecularGraph. Since OpenBabel is written in C++ and (I think) has an associated DLL/SO, could this be hosted as an artifact for the potential OpenBabel.jl? Or are you intending on making installing the program a prerequisite?
It's already available with https://github.com/JuliaBinaryWrappers/openbabel_jll.jl. That's a low-level wrapper, someone might want to create OpenBabel.jl to give it a more "Julian" interface. But having the JLL already built is a huge head start.
Thanks for the hint. On the first glance, I can only handle the executables that come with the openbabel_jll. With openbabel_jll installed I can do
using openbabel_jll
ENV["BABEL_LIBDIR"] = readdir(joinpath(dirname(dirname(openbabel_jll.obabel_path)), "lib", "openbabel"), join = true)[1]
babel_cmd = openbabel_jll.obabel();
rstrip(read(`$babel_cmd "-:CCO" -ismi -omol --gen2D`, String))
However the time for the conversion command turns out to be the double of what I achieved by simply extracting the files from the binary OpenBabel distribution
babel_cmd = raw"C:\Temp\Openbabel\obabel"
rstrip(read(`$babel_cmd "-:CCO" -ismi -omol --gen2D`, String))
where I simply extract the files from the distributed OpenBabel binary. That's a bit strange.
Unfortunately, I don't have a clue, how to access libopenbabel directly. Would I need to use Cxx.jl or CxxWrap.jl as openbabel is written in C++? Up to now I only did some ccalls into well documented C libraries. Any hint is welcome.
I've not written a high-level wrapper for a JLL before, so you should read docs or ask others. But my understanding is that you won't need to use backticks anywhere, that everything will be a direct call. This highlighted line suggests some of the things you should be able to do: https://github.com/JuliaBinaryWrappers/openbabel_jll.jl/blob/ad1a8e44175f44bd378afdb43f1896ff05161bda/src/wrappers/x86_64-linux-gnu-cxx11.jl#L2
Unfortunately, all of these functions are indeed executable in julia but they do nothing else than returning a setenv
with the binary executables together with an environment.
I guess one has to use a higher-level library that is written by the help of Swig or rewrite small pieces of code with Cxx.jl.
I'd strongly recommend checking in with the #binarybuilder
channel on Slack before going to any effort.
Meanwhile I think the most effective solution is to use a Python wrapper, e.g.
using PyCall
ob = pyimport("openbabel.openbabel")
function babel(input, informat = "smi", outformat = "mol")
conv = ob.OBConversion()
conv.SetInAndOutFormats("smi", "mol")
mol = ob.OBMol()
conv.ReadString(mol, input)
if ! mol.Has2D() && ! mol.Has3D()
pgen = ob.OBOp.FindType("gen2D")
pgen !== nothing && pgen.Do(mol)
end
conv.WriteString(mol)
end
julia> @time babel("CCO") |> println
OpenBabel06162301242D
3 2 0 0 0 0 0 0 0 0999 V2000
1.7321 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.8660 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 -0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
0.005649 seconds (62 allocations: 2.906 KiB)
Thank you for the great suggestions. It would be nice to create new library OpenBabel.jl with Julian interface. It would be easy to keep compatibility with MolecularGraph.jl. FYI, coordgenlibs_jll used in MolecularGraph.jl is a fork of coordgenlibs written in C++ which has additional C interfaces enables its use in Julia's ccall.
I'll have a look that sounds interesting. Did you have a look at the PR #95? That should work right away, although full functionality would only come with an upgrade of libinchi. What do you think about the
Yes, it looks great. It will be merged soon. Also, thank you for fixing the Windows issue. I was wondering why CI has been failing on only Windows!
There is also the RDKitMinimalLib wrapper which has a different mol object of course. A function which translates these mol objects into MolecularGraph mol objects would solve https://github.com/mojaie/MolecularGraph.jl/issues/72 and https://github.com/mojaie/MolecularGraph.jl/issues/67 and would be an awesome addition to this package.
Edit: The simplest solution would be to use something like smiles as a common language. E.g. use get_mol() and get_smiles() functions from RDKitMinimalLib to obtain a SMILES from e.g. a molblock and then use smilestomol() from MolecularGraph to obtain the mol object.
smilestomol(get_smiles(get_mol(molblock)))
But this would lose spatial information from the molblock...
Edit2: Instead of SMILES one should at least use InChI as SMILES is not an open standard and different SMILES specifications exist.
Like mentioned here https://github.com/mojaie/MolecularGraph.jl/issues/23#issuecomment-1926960506 openbabel_jll provides an ExecutableProduct.
It also providest two LibraryProducts libinchi
and libopenbabel
.
One can see all products available using ? openbabel_jll
which returns the following products:
Products
========
The code bindings within this package are autogenerated from the following Products:
• LibraryProduct: libinchi
• LibraryProduct: libopenbabel
• ExecutableProduct: obabel
• ExecutableProduct: obconformer
• ExecutableProduct: obdistgen
• ExecutableProduct: obenergy
• ExecutableProduct: obfit
• ExecutableProduct: obfitall
• ExecutableProduct: obgen
• ExecutableProduct: obgrep
• ExecutableProduct: obminimize
• ExecutableProduct: obmm
• ExecutableProduct: obprobe
• ExecutableProduct: obprop
• ExecutableProduct: obrms
• ExecutableProduct: obrotamer
• ExecutableProduct: obrotate
• ExecutableProduct: obspectrophore
• ExecutableProduct: obsym
• ExecutableProduct: obtautomer
• ExecutableProduct: obthermo
• ExecutableProduct: roundtrip
Therefore we would be able to call functions using ccall()
from libopenbabel, if we would know the functionnames...
Using readelf
one can inspect shared library files and look for functions.
readelf -sW ~/.julia/artifacts/f1eb34813a945111198356e33b5f2034cc7990ab/lib/libopenbabel.so
This is still pretty messy and I didn't find a proper documentation for this library.
Edit: OpenBabel API Docs
For multithreaded apps the above solution crashed. I then came up with
using PythonCall
using ThreadPools
macro pythread(expr)
quote
fetch(@tspawnat 1 begin
$(esc(expr))
end)
end
end
function babel_convert(input, informat = "cdxml", outformat = "mol")
@pythread begin
ob = pyimport("openbabel.openbabel")
conv = ob.OBConversion()
conv.SetInAndOutFormats(informat, outformat)
mol = ob.OBMol()
conv.ReadString(mol, input)
if ! Bool(mol.Has2D()) && ! Bool(mol.Has3D())
pgen = ob.OBOp.FindType("gen2D")
pgen !== nothing && pgen.Do(mol)
end
pyconvert(String, conv.WriteString(mol))
end
end
I also placed this on discourse and on https://github.com/JuliaPy/PythonCall.jl/issues/201
Currently, only smiles and sdf format are supported for molecule generation. For export, only v2 sdf is in place.
I submitted a #95 to generate sdf files and molecules from inchi strings.
Furthermore it would be fantastic to include OpenBabel, perhaps via a new package OpenBabel.jl and package extensions? @mojaie Tell me what you think.