rdkit-rs / rdkit

RDKit Made Idiomatic for Rust
12 stars 9 forks source link

Using ROMol object #23

Open bertiewooster opened 4 months ago

bertiewooster commented 4 months ago

This is more of a usage question. I'm trying to use the ROMol object in a Polars plug-in to canonicalize a SMILES string:

use rdkit_rs::ROMol;

#[polars_expr(output_type=String)]
fn canonicalize(inputs: &[Series]) -> PolarsResult<Series> {
    let ca: &StringChunked = inputs[0].str()?;
    let romol = ROMol::from_smiles(ca).unwrap();
    let out: StringChunked = ca.apply_to_buffer(|value: &str, output: &mut String| {
        write!(output, "{}", romol.as_smiles()).unwrap()
    });
    Ok(out.into_series())
}

but the use line has this error:

unresolved import `rdkit_rs::ROMol`
no `ROMol` in the rootrustc
extern crate rdkit_rs

My Cargo.toml file includes

[dependencies]
rdkit-rs = "0.1.0"
aleebberg commented 4 months ago

Hi, I think you added the wrong crate to your Cargo.toml. Try: cargo add rdkit or add 'rdkit = "0.4.6"' to your Cargo.toml file and adapt your use-statement accordingly. It worked for me then.

bertiewooster commented 4 months ago

Thank you so much @aleebberg! I substituted in the rdkit crate by running cargo add rdkit, which adds rdkit = "0.4.6" to my Cargo.toml file.

Unfortunately, when I then run maturin develop I get

error: failed to run custom build command for `rdkit-sys v0.4.6`

due to

  cargo:warning=/Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rdkit-sys-0.4.6/wrapper/include/scaffold_network.h:4:10: fatal error: 'GraphMol/GraphMol.h' file not found
  cargo:warning=#include <GraphMol/GraphMol.h>
  cargo:warning=         ^~~~~~~~~~~~~~~~~~~~~

which comes from the file scaffold_network.h

I cannot find a file named GraphMol.h in the rdkit-sys repo; is it possible that file is missing from the repo? Or am I doing something wrong?

By the way, if I use the template provided in this Polars plugin tutorial without the rdkit crate, the compilation works fine and I can run the Python code which calls the Polars plugin fn pig_latinnify from the tutorial.

I've put my code in a public repo polars_rdkit_canonicalizer in case you would like to access it. I am using an M2 Apple Silicon chip in case that's relevant.

Here is the full output of maturin develop:

(venv-polars-rdkit) user@computer polars_rdkit_canonicalizer % maturin develop
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.8
🐍 Not using a specific python interpreter
    Blocking waiting for file lock on build directory
   Compiling polars-io v0.37.0
   Compiling polars-ops v0.37.0
   Compiling rdkit-sys v0.4.6
   Compiling polars-ffi v0.37.0
   Compiling serde-pickle v1.1.1
   Compiling polars v0.37.0
   Compiling polars-plan v0.37.0
The following warnings were emitted during compilation:

warning: rdkit-sys@0.4.6: xcrun: error: unable to lookup item 'PlatformVersion' from command line tools installation
warning: rdkit-sys@0.4.6: xcrun: error: unable to lookup item 'PlatformVersion' in SDK '/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk'
warning: rdkit-sys@0.4.6: macOS deployment target (10.7) too low, it will be increased
warning: rdkit-sys@0.4.6: In file included from /Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/sources/rdkit-sys/src/bridge/scaffold_network.rs.cc:1:
warning: rdkit-sys@0.4.6: /Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rdkit-sys-0.4.6/wrapper/include/scaffold_network.h:4:10: fatal error: 'GraphMol/GraphMol.h' file not found
warning: rdkit-sys@0.4.6: #include <GraphMol/GraphMol.h>
warning: rdkit-sys@0.4.6:          ^~~~~~~~~~~~~~~~~~~~~
warning: rdkit-sys@0.4.6: 1 error generated.

error: failed to run custom build command for `rdkit-sys v0.4.6`

Caused by:
  process didn't exit successfully: `/Users/user/Projects/polars_rdkit_canonicalizer/target/debug/build/rdkit-sys-49e73038daf200be/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=wrapper/src/scaffold_network.cc
  cargo:rerun-if-changed=wrapper/include/scaffold_network.h
  cargo:rerun-if-changed=wrapper/src/fingerprint.cc
  cargo:rerun-if-changed=wrapper/include/fingerprint.h
  cargo:rerun-if-changed=wrapper/src/periodic_table.cc
  cargo:rerun-if-changed=wrapper/include/periodic_table.h
  cargo:rerun-if-changed=wrapper/src/ro_mol.cc
  cargo:rerun-if-changed=wrapper/include/ro_mol.h
  cargo:rerun-if-changed=wrapper/src/rw_mol.cc
  cargo:rerun-if-changed=wrapper/include/rw_mol.h
  cargo:rerun-if-changed=wrapper/src/mol_standardize.cc
  cargo:rerun-if-changed=wrapper/include/mol_standardize.h
  cargo:rerun-if-changed=wrapper/src/substruct_match.cc
  cargo:rerun-if-changed=wrapper/include/substruct_match.h
  cargo:rerun-if-changed=wrapper/src/descriptors.cc
  cargo:rerun-if-changed=wrapper/include/descriptors.h
  cargo:rerun-if-changed=wrapper/src/mol_ops.cc
  cargo:rerun-if-changed=wrapper/include/mol_ops.h
  cargo:CXXBRIDGE_PREFIX=rdkit-sys
  cargo:CXXBRIDGE_DIR0=/Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/include
  cargo:CXXBRIDGE_DIR1=/Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/crate
  TARGET = Some("x86_64-apple-darwin")
  OPT_LEVEL = Some("0")
  HOST = Some("aarch64-apple-darwin")
  cargo:rerun-if-env-changed=CXX_x86_64-apple-darwin
  CXX_x86_64-apple-darwin = None
  cargo:rerun-if-env-changed=CXX_x86_64_apple_darwin
  CXX_x86_64_apple_darwin = None
  cargo:rerun-if-env-changed=TARGET_CXX
  TARGET_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  RUSTC_LINKER = None
  cargo:rerun-if-env-changed=CROSS_COMPILE
  CROSS_COMPILE = None
  cargo:rerun-if-env-changed=CC_ENABLE_DEBUG_OUTPUT
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  DEBUG = Some("true")
  cargo:warning=xcrun: error: unable to lookup item 'PlatformVersion' from command line tools installation
  cargo:warning=xcrun: error: unable to lookup item 'PlatformVersion' in SDK '/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk'
  cargo:warning=macOS deployment target (10.7) too low, it will be increased
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-apple-darwin
  CXXFLAGS_x86_64-apple-darwin = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_apple_darwin
  CXXFLAGS_x86_64_apple_darwin = None
  cargo:rerun-if-env-changed=TARGET_CXXFLAGS
  TARGET_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:warning=In file included from /Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/sources/rdkit-sys/src/bridge/scaffold_network.rs.cc:1:
  cargo:warning=/Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rdkit-sys-0.4.6/wrapper/include/scaffold_network.h:4:10: fatal error: 'GraphMol/GraphMol.h' file not found
  cargo:warning=#include <GraphMol/GraphMol.h>
  cargo:warning=         ^~~~~~~~~~~~~~~~~~~~~
  cargo:warning=1 error generated.

  --- stderr

  CXX include path:
    /Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/include
    /Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/crate

  error occurred: Command env -u IPHONEOS_DEPLOYMENT_TARGET "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-gdwarf-2" "-fno-omit-frame-pointer" "-m64" "--target=x86_64-apple-darwin" "-mmacosx-version-min=10.7" "-I" "/Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/include" "-I" "/Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/crate" "-I" "/opt/homebrew/include" "-I" "/opt/homebrew/include/rdkit" "-I" "/Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rdkit-sys-0.4.6" "-std=c++17" "-o" "/Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/aa23d4cfa6943233-scaffold_network.rs.o" "-c" "/Users/user/Projects/polars_rdkit_canonicalizer/target/x86_64-apple-darwin/debug/build/rdkit-sys-c62f2da140c9fd58/out/cxxbridge/sources/rdkit-sys/src/bridge/scaffold_network.rs.cc" with args "c++" did not execute successfully (status code exit status: 1).

warning: build failed, waiting for other jobs to finish...
💥 maturin failed
  Caused by: Failed to build a native library through cargo
  Caused by: Cargo build finished with "exit status: 101": `env -u CARGO PYO3_ENVIRONMENT_SIGNATURE="cpython-3.12-64bit" PYO3_PYTHON="/Users/user/Projects/venv-polars-rdkit/bin/python" PYTHON_SYS_EXECUTABLE="/Users/user/Projects/venv-polars-rdkit/bin/python" "cargo" "rustc" "--target" "x86_64-apple-darwin" "--message-format" "json-render-diagnostics" "--manifest-path" "/Users/user/Projects/polars_rdkit_canonicalizer/Cargo.toml" "--lib" "--" "-C" "link-arg=-undefined" "-C" "link-arg=dynamic_lookup" "-C" "link-args=-Wl,-install_name,@rpath/rdkit_canonicalizer.abi3.so"`
bertiewooster commented 4 months ago

P.S. It occurred to me to try removing that line

#include <GraphMol/GraphMol.h>

in case it wasn't required. When I did, I got a similar error on the next line:

  cargo:warning=/Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rdkit-sys-0.4.6/wrapper/include/scaffold_network.h:4:10: fatal error: 'GraphMol/ScaffoldNetwork/ScaffoldNetwork.h' file not found
  cargo:warning=#include <GraphMol/ScaffoldNetwork/ScaffoldNetwork.h>
  cargo:warning=         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When I removed that line as well, I got a series of errors like

  cargo:warning=/Users/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rdkit-sys-0.4.6/wrapper/include/scaffold_network.h:6:33: error: use of undeclared identifier 'ScaffoldNetwork'
  cargo:warning=  using ScaffoldNetworkParams = ScaffoldNetwork::ScaffoldNetworkParams;
  cargo:warning=                                ^

so it seems like those include statements, and the files they point to, are needed.

bertiewooster commented 3 months ago

I now understand that GraphMol/GraphMol.h etc. are RDKit files. Per the rdkit-rs/rdkit Prerequisites, I successfully ran brew install rdkit which ended with

You may need to add RDBASE to your environment variables.
For Bash, put something like this in your $HOME/.bashrc:
  export RDBASE=/usr/local/opt/rdkit/share/RDKit

I tried putting that in both my .bashrc and .zshrc but still got that same error. Is there some other way I should be pointing rdkit-rs/rdkit to my RDKit installation? Or do I not have the files I need in /usr/local/opt/rdkit/share/RDKit? The contents are below. By the way, I searched for GraphMol.h on my computer and found it in conda environments such as /Users/user/opt/anaconda3/envs/py311_rdkit_beta/include/rdkit/GraphMol/GraphMol.h, but not in /usr/local/opt/rdkit/share/RDKit.

I'm using a venv to run maturin develop; I tried using a conda environment instead and got the same result.

rdkit_dir_contents