rdkit-rs / rdkit

RDKit Made Idiomatic for Rust
17 stars 11 forks source link

Proposal: Simplify Rust Bindings #39

Open mvisani opened 2 weeks ago

mvisani commented 2 weeks ago

Hi @xrl,

I’ve realized that we don’t need to manually rewrite all functions from every class as we’re currently doing. By including the RDKit directory in the build.rs file, CXX can automatically find not just the functions in our wrappers, but also those in the RDKit library itself. This can significantly speed up development since we can directly reference the functions in the cxx::bridge.

I’ve created a small example repository to demonstrate how straightforward this is.

Highlights of the Current Wrapper

wrapper.h:

#pragma once
#include "rust/cxx.h"
#include <GraphMol/Atom.h>
#include <memory>

namespace RDKit {
std::shared_ptr<Atom> make_shared(std::unique_ptr<Atom> atom);
std::unique_ptr<Atom> newAtom();
std::unique_ptr<Atom> newAtomFromAtomicNum(int atomicNum);
std::unique_ptr<Atom> newAtomFromSymbol(const std::string &symbol);
std::unique_ptr<Atom> newAtomFromOther(const Atom &other);
rust::String getSymbolAsString(const Atom &atom);
bool MatchRust(const Atom &atom, std::unique_ptr<Atom> other);
int calcExplicitValence(Atom &atom, bool strict = true);
int calcImplicitValence(Atom &atom, bool strict = true);
} // namespace RDKit

wrapper.cc:

#include "rdkit-rust-ffi/include/wrapper.h"

namespace RDKit {
std::shared_ptr<Atom> make_shared(std::unique_ptr<Atom> atom) { return std::shared_ptr<Atom>(atom.release()); }
std::unique_ptr<Atom> newAtom() { return std::unique_ptr<Atom>(new Atom()); }
std::unique_ptr<Atom> newAtomFromAtomicNum(int atomicNum) { return std::unique_ptr<Atom>(new Atom(atomicNum)); }
// Additional wrapper functions...
}

Using RDKit's Built-in Functions

As seen in the repository, there are more functions listed in lib.rs than in the wrapper, but all are callable and can be used directly. For example, passing self: &Atom allows us to call functions directly like atom.getTotalValence(). This removes the need for boilerplate code like:

pub fn get_is_aromatic(&self) -> bool {
    ro_mol_ffi::get_is_aromatic(self.ptr.as_ref())
}

pub fn get_atomic_num(&self) -> i32 {
    ro_mol_ffi::get_atomic_num(self.ptr.as_ref())
}

This simplifies usage in the rdkit crate and makes the code more maintainable.

Next Steps

We have two options:

  1. Refactor the existing repo to follow this approach. While it may take some time initially, it will speed up development and reduce potential errors.
  2. Continue building on the example repo I’ve created and start a new crate.

What are your thoughts?

Best regards,
Marco

P.S. I’ve added an explanation on how to download and compile RDKit and link the C++ library to our project. This should also add Windows support (although I haven’t tested it yet).

xrl commented 5 days ago

This is very promising if it works as I'm understanding it. I am going to pull down your test repo (very nice idea!) and play around with it.

Are you interested in converting some of the existing bindings to this simplified flavor?

mvisani commented 4 days ago

@xrl I've continued working a bit in this small repo. It actually saves a lot of time ! I'll create a pull request with a couple of modifications in the rdkit-sys crate. I'll wait for your feedback.