ntBre / rdkit-rs

RDKit interface in safe Rust
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Cannot get rdkit-rs to compile with RDKit on ARM64 Mac #4

Open bertiewooster opened 5 months ago

bertiewooster commented 5 months ago

I successfully built RDKit using the instructions in this repo's README:

git clone --depth 1 https://github.com/rdkit/rdkit
cd rdkit
mkdir build
cd build
cmake .. -DRDK_BUILD_INCHI_SUPPORT=ON
make

but I get an error when I try cargo test or cargo build in rdkit-rs. It seems like a problem is that RDKit was built for x86_64, and the rust binary (and libshim) for arm64.

Here is file information for the rdkit files:

my_username@computer lib % file libRDKitRDGeneral.1.dylib
libRDKitRDGeneral.1.dylib: Mach-O 64-bit dynamically linked shared library x86_64
my_username@computer lib % file libRDKitRDGeneral.2024.03.1pre.dylib
libRDKitRDGeneral.2024.03.1pre.dylib: Mach-O 64-bit dynamically linked shared library x86_64
my_username@computer lib % file libRDKitRDGeneral.dylib
libRDKitRDGeneral.dylib: Mach-O 64-bit dynamically linked shared library x86_64

And for the rust outputs:

my_username@computer deps % file rdkit_rs-322809f945b58704
rdkit_rs-322809f945b58704: Mach-O 64-bit executable arm64
my_username@computer deps % file rdkit_rs-322809f945b58704.1ippcslhdowi4g30.rcgu.o
rdkit_rs-322809f945b58704.1ippcslhdowi4g30.rcgu.o: Mach-O 64-bit object arm64

I tried to force rust to target x86_64 by putting in my Cargo.toml:

[package]
target = "x86_64-apple-darwin"

but still get the same error ld: symbol(s) not found for architecture x86_64

my_username@computer rdkit-rs % DYLD_LIBRARY_PATH=/Users/my_username/Projects/rdkit-build4/build/lib LDFLAGS='-undefined dynamic_lookup' CXX='clang++ -v' RDROOT=/Users/my_username/Projects/rdkit-build4 cargo test --locked --all-features --target x86_64-apple-darwin 

   Compiling rdkit-rs v0.1.0 (/Users/my_username/Projects/rdkit-rs)

error: linking with `cc` failed: exit status: 1

  = note: ld: warning: ignoring file '/Users/my_username/Projects/rdkit-rs/target/x86_64-apple-darwin/debug/build/rdkit-sys-7ee35bed4ca4fed8/out/libshim.so': found architecture 'arm64', required architecture 'x86_64'
          Undefined symbols for architecture x86_64:
            "_RDKit_MolToInchiKey", referenced from:
                rdkit_rs::ROMol::to_inchi_key::h5172cd82370ef807 in rdkit_rs-e2a4e46d956ccc68.28o1si5g1c9cz2gk.rcgu.o
            "_RDKit_MolToSmiles", referenced from:
                rdkit_rs::ROMol::to_smiles::h7713d3c3f54d59e4 in rdkit_rs-e2a4e46d956ccc68.28o1si5g1c9cz2gk.rcgu.o
            "_RDKit_ROMol_copy", referenced from:
                _$LT$rdkit_rs..ROMol$u20$as$u20$core..clone..Clone$GT$::clone::h28647307f8ff299d in rdkit_rs-e2a4e46d956ccc68.28o1si5g1c9cz2gk.rcgu.o
            "_RDKit_ROMol_delete", referenced from:
                _$LT$rdkit_rs..ROMol$u20$as$u20$core..ops..drop..Drop$GT$::drop::hc264690e13ba42c0 in rdkit_rs-e2a4e46d956ccc68.3k3fy8uhmxfe4loy.rcgu.o
            "_RDKit_ROMol_getNumAtoms", referenced from:
                rdkit_rs::ROMol::num_atoms::h304c369f182c51ae in rdkit_rs-e2a4e46d956ccc68.28o1si5g1c9cz2gk.rcgu.o
            "_RDKit_RunReactants", referenced from:
                rdkit_rs::fragment::ChemicalReaction::run_reactants::h9847ee103a6b48e0 in rdkit_rs-e2a4e46d956ccc68.18ph80dy2z5e5khk.rcgu.o
            "_RDKit_RxnSmartsToChemicalReaction", referenced from:
                rdkit_rs::fragment::ChemicalReaction::from_smarts::ha643d2ce11b283d7 in rdkit_rs-e2a4e46d956ccc68.18ph80dy2z5e5khk.rcgu.o
            "_RDKit_SanitizeMol", referenced from:
                rdkit_rs::ROMol::sanitize::h20a2c2852434aae5 in rdkit_rs-e2a4e46d956ccc68.28o1si5g1c9cz2gk.rcgu.o
            "_RDKit_SmilesToMol", referenced from:
                rdkit_rs::ROMol::from_smiles_full::h70e14b47ed6df605 in rdkit_rs-e2a4e46d956ccc68.28o1si5g1c9cz2gk.rcgu.o
          ld: symbol(s) not found for architecture x86_64
          clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: could not compile `rdkit-rs` (lib test) due to 1 previous error
ntBre commented 5 months ago

Ah this is actually a helpful error message! Those symbols look like they're actually from my C wrapper, not RDKit itself. Can you try find . -name libshim.so -exec file {} \; in your rdkit-rs directory? I'm guessing that my make recipe/clang is compiling the wrapper shared library from rdkit-sys for the wrong architecture. Hopefully this is a good sign!

bertiewooster commented 5 months ago
my_username@computer rdkit-rs % find . -name libshim.so -exec file {} \;
./target/x86_64-apple-darwin/debug/build/rdkit-sys-7ee35bed4ca4fed8/out/libshim.so: Mach-O 64-bit dynamically linked shared library arm64
./target/debug/build/rdkit-sys-d14b2b3ea9597d35/out/libshim.so: Mach-O 64-bit dynamically linked shared library arm64

So those are compiled for the arm64 architecture.

ntBre commented 5 months ago

There are three separate files/sets of files involved here:

  1. RDKit shared libraries
  2. libshim.so
  3. Final Rust binary

From these steps, it looks like your RDKit is compiled for x86_64, but the other two are compiled for ARM? This may not be the final problem, but it at least makes sense to me for them not to get along if they are compiled for different architectures.

Since the other two are ARM, it seems potentially easiest to see if RDKit can be compiled for ARM. On the other hand, I assume that some additional CXX flag should tell clang to compile for a specific architecture, which should "fix" libshim. I don't know the specific flags to make either Rust or clang compile for a particular architecture, but I think this is a good approach to pursue at least.

bertiewooster commented 5 months ago

it seems potentially easiest to see if RDKit can be compiled for ARM

Pursuing that option, I tried make CC=clang CXX=clang++ to target compilation to ARM64 per ChatGPT suggestion

For compilation, you'll need to ensure that you have a compiler that targets ARM64 architecture. On macOS ARM64, this typically means using the native compiler provided by Apple, such as clang.

It gave errors, here's a partial log:

(base) my_username@computer Projects % git clone --depth 1 https://github.com/rdkit/rdkit rdkit-build7
Cloning into 'rdkit-build7'...
remote: Enumerating objects: 5576, done.
remote: Counting objects: 100% (5576/5576), done.
remote: Compressing objects: 100% (3995/3995), done.
remote: Total 5576 (delta 1638), reused 3435 (delta 1406), pack-reused 0
Receiving objects: 100% (5576/5576), 76.17 MiB | 8.49 MiB/s, done.
Resolving deltas: 100% (1638/1638), done.
(base) my_username@computer Projects % cd rdkit-build7
(base) my_username@computer rdkit-build7 % mkdir build
(base) my_username@computer rdkit-build7 % cd build
(base) my_username@computer build % cmake .. -DRDK_BUILD_INCHI_SUPPORT=ON
-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
RDK_OPTIMIZE_POPCNT is not available on aarch64 or arm64
Disabling boost::stacktrace on non-linux platform
CMake Warning at CMakeLists.txt:152 (find_package):
  By not providing "FindCatch2.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Catch2", but
  CMake did not find one.

  Could not find a package configuration file provided by "Catch2" (requested
  version 3) with any of the following names:

    Catch2Config.cmake
    catch2-config.cmake

  Add the installation prefix of "Catch2" to CMAKE_PREFIX_PATH or set
  "Catch2_DIR" to a directory containing one of the above files.  If "Catch2"
  provides a separate development package or SDK, be sure it has been
  installed.

-- Performing Test HAVE_FLAG__ffile_prefix_map__Users_my_username_Projects_rdkit_build7_build__deps_catch2_src__
-- Performing Test HAVE_FLAG__ffile_prefix_map__Users_my_username_Projects_rdkit_build7_build__deps_catch2_src__ - Success
-- Could NOT find InChI in system locations (missing: INCHI_LIBRARY INCHI_INCLUDE_DIR) 
Downloading https://rdkit.org/downloads/INCHI-1-SRC.zip...
-- Found PythonInterp: /opt/miniconda3/bin/python (found version "3.12.1") 
-- Found PythonLibs: /opt/miniconda3/lib/libpython3.12.dylib (found version "3.12.1") 
PYTHON Py_ENABLE_SHARED: 0
PYTHON USING LINK LINE: -bundle -undefined dynamic_lookup -Wl,-rpath,/opt/miniconda3/lib -L/opt/miniconda3/lib -Wl,-rpath,/opt/miniconda3/lib -L/opt/miniconda3/lib
nbval not found, disabling the jupyter tests
-- Found Eigen3: /usr/local/include/eigen3 (Required is at least version "2.91.0") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found Boost: /opt/miniconda3/lib/cmake/Boost-1.82.0/BoostConfig.cmake (found suitable version "1.82.0", minimum required is "1.58.0") found components: system serialization iostreams 
-- Found Boost: /opt/miniconda3/lib/cmake/Boost-1.82.0/BoostConfig.cmake (found suitable version "1.82.0", minimum required is "1.58.0") found components: system iostreams 
== Using strict rotor definition
-- maeparser include dir set as 'maeparser_INCLUDE_DIRS-NOTFOUND'
-- maeparser libraries set as 'maeparser_LIBRARIES-NOTFOUND'
-- Could NOT find maeparser (missing: maeparser_INCLUDE_DIRS maeparser_LIBRARIES) 
Downloading https://github.com/schrodinger/maeparser/archive/v1.3.1.tar.gz...
-- coordgen include dir set as coordgen_INCLUDE_DIRS-NOTFOUND
-- coordgen libraries set as 'coordgen_LIBRARIES-NOTFOUND'
-- Could NOT find coordgen (missing: coordgen_INCLUDE_DIRS coordgen_LIBRARIES) 
Downloading https://github.com/schrodinger/coordgenlibs/archive/v3.0.2.tar.gz...
Downloading https://github.com/rareylab/RingDecomposerLib/archive/v1.1.3_rdkit.tar.gz...
== Updating Filters.cpp from pains file
== Done updating pains files
Downloading https://github.com/google/fonts/raw/main/ofl/comicneue/ComicNeue-Regular.ttf...
Downloading https://github.com/google/fonts/raw/main/ofl/comicneue/OFL.txt...
-- Found Freetype: /usr/local/lib/libfreetype.dylib (found version "2.13.2") 
-- Found Boost: /opt/miniconda3/lib/cmake/Boost-1.82.0/BoostConfig.cmake (found suitable version "1.82.0", minimum required is "1.58.0") found components: program_options 
Downloading https://github.com/Tencent/rapidjson/archive/v1.1.0.tar.gz...
-- Configuring done (12.8s)
-- Generating done (1.0s)
-- Build files have been written to: /Users/my_username/Projects/rdkit-build7/build
(base) my_username@computer build % make CC=clang CXX=clang++
...
/Users/my_username/Projects/rdkit-build7/Code/RDGeneral/hash/hash.hpp:344:43: note: expanded from macro 'BOOST_HASH_SPECIALIZE'
      : public boost::functional::detail::unary_function<type,                 \
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
/opt/miniconda3/include/boost/functional.hpp:45:24: note: '__unary_function' declared here
            using std::unary_function;
                       ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make[2]: *** [Code/GraphMol/CMakeFiles/GraphMol.dir/Canon.cpp.o] Error 1
make[1]: *** [Code/GraphMol/CMakeFiles/GraphMol.dir/all] Error 2
make: *** [all] Error 2
ntBre commented 5 months ago

I think you cut out the actual error part of the log. From what you pasted here, it looks like the cmake .. step completed successfully, but building with make failed. From the very end of the output, it looks like there could be an issue with your boost library or the standard library (!), but the first error is usually the most informative. If there is an issue with one of these libraries, it's probably due to them being installed for a different architecture from what you're compiling for. This might be a question for the RDKit maintainers. I found these two threads:

https://github.com/conda-forge/rdkit-feedstock/issues/63 https://github.com/conda-forge/rdkit-feedstock/pull/54

But I don't see anything in the main installation docs about targeting ARM.

Maybe the other option (building the Rust code and libshim for x86) is the better approach then. I'm just throwing out ideas because there's no real way for me to debug this locally.