molecularinformatics / roshambo

GNU General Public License v3.0
42 stars 5 forks source link

Installation in a dockerfile #5

Open finlayiainmaclean opened 2 weeks ago

finlayiainmaclean commented 2 weeks ago

Thanks for your work on roshambo! Has anyone managed to get roshambo working in a docker image? Here is my attempt:


FROM mambaorg/micromamba:1.5.9 as micromamba

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

USER root
ARG MAMBA_USER=mambauser
ENV MAMBA_EXE="/bin/micromamba"

COPY --from=micromamba "$MAMBA_EXE" "$MAMBA_EXE"
COPY --from=micromamba /usr/local/bin/ /usr/local/bin/
COPY --from=micromamba /usr/local/bin/ /usr/local/bin/
COPY --from=micromamba /usr/local/bin/ /usr/local/bin/
COPY --from=micromamba /usr/local/bin/ /usr/local/bin/
COPY --from=micromamba /usr/local/bin/ /usr/local/bin/

RUN /usr/local/bin/ && \

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    wget \
    git \
    bzip2 \
    unzip \
    curl \
    sudo \
    ca-certificates \
    libglib2.0-0 \
    libxext6 \
    libsm6 \
    libxrender1 \
    build-essential \
    cmake \
    libboost-all-dev \
    libeigen3-dev \
    libtbb2 \
    libtbb-dev \
    libsqlite3-dev \
    libpng-dev \
    libfreetype6-dev \
    libhdf5-serial-dev \
    libjsoncpp-dev \
    libxml2-dev \
    libbz2-dev \
    libz-dev \
    libcairo2-dev && \
    apt-get autoremove -y && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN git clone
WORKDIR /app/roshambo 
RUN micromamba create -n roshambo python=3.9 -y -c conda-forge
RUN micromamba install -n roshambo -c conda-forge boost-cpp boost cairo pandas pillow freetype cmake numpy eigen matplotlib -y

WORKDIR /app/roshambo
RUN git clone
WORKDIR /app/roshambo/rdkit
# RUN git checkout Release_2023_03_1
# RUN sed 's/23ed3f833c1ae0adb141a26b4a30d73e/850b0df852f1cda4970887b540f8f333/g' Code/GraphMol/MolDraw2D/CMakeLists.txt > Code/GraphMol/MolDraw2D/CMakeLists.txt

RUN mkdir /app/roshambo/rdkit/build
WORKDIR /app/roshambo/rdkit/build
RUN micromamba run -n roshambo cmake \
    -DPy_ENABLE_SHARED=1   \
    -DBoost_NO_SYSTEM_PATHS=ON   \
    -DPYTHON_NUMPY_INCLUDE_PATH=/opt/conda/envs/roshambo/lib/python3.9/site-packages/numpy/_core/include   \
    -DINCHI_URL=   ..
RUN micromamba run -n roshambo make -j4 install

WORKDIR /app/roshambo/rdkit
ENV RDBASE=/app/roshambo/rdkit

WORKDIR /app/roshambo/
ENV CUDA_HOME=/usr/local/cuda

RUN micromamba run -n roshambo pip install -e .
RUN micromamba run -n roshambo pip install IPython pandas jupyter

SHELL ["/usr/local/bin/"]

Which yields:

    g++ -pthread -B /opt/conda/envs/roshambo/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O3 -Wall -fPIC -O3 -isystem /opt/conda/envs/roshambo/include -fPIC -O3 -isystem /opt/conda/envs/roshambo/include -fPIC -I/usr/local/cuda/include -Ipaper -I/app/roshambo/rdkit/Code -I/opt/conda/envs/roshambo/include/python3.9 -c paper/inputPreprocessor.cpp -o build/temp.linux-aarch64-cpython-39/paper/inputPreprocessor.o -O2 -I/app/roshambo/rdkit/Code -DORIG_GLOBAL -DFAST_OVERLAP -DNO_DIV_ADDRESS -DGPP -std=c++17
    /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -Ipaper -I/app/roshambo/rdkit/Code -I/opt/conda/envs/roshambo/include/python3.9 -c paper/ -o build/temp.linux-aarch64-cpython-39/paper/transformTools.o -O2 -I/app/roshambo/rdkit/Code -DORIG_GLOBAL -DFAST_OVERLAP -DNO_DIV_ADDRESS -std=c++17 -Xcompiler -O2 -arch sm_50 -Xptxas -v --ptxas-options=-v -c --compiler-options -fPIC
    ptxas info    : 0 bytes gmem
    g++ -pthread -B /opt/conda/envs/roshambo/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O3 -Wall -fPIC -O3 -isystem /opt/conda/envs/roshambo/include -fPIC -O3 -isystem /opt/conda/envs/roshambo/include -fPIC -I/usr/local/cuda/include -Ipaper -I/app/roshambo/rdkit/Code -I/opt/conda/envs/roshambo/include/python3.9 -c roshambo/cpaper.cpp -o build/temp.linux-aarch64-cpython-39/roshambo/cpaper.o -I/app/roshambo/rdkit/Code -x cu -std=c++17 -arch=sm_70 --ptxas-options=-v -c --compiler-options -fPIC
    g++: warning: ‘-x cu’ after last input file has no effect
    g++: error: unrecognized command-line option ‘-arch=sm_70’
    g++: error: unrecognized command-line option ‘--ptxas-options=-v’
    g++: error: unrecognized command-line option ‘--compiler-options’; did you mean ‘--completion=’?
    error: command '/usr/bin/g++' failed with exit code 1
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

It seems that attempts to compile the roshambo/cpaper.cpp src file using g++, but with flags only compatible for nvcc.

Any help is appreciated!

MKCarter commented 4 days ago

Hey @finlayiainmaclean,

This is an interesting error, which I just experienced for myself when running this on a RTX 4090 GPU - it does require a bit of tweaking of 2 files.

First for the script - if you replace your script with the following you should remove the g++ errors.

import os
import setuptools
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext

# Check for necessary environment variables
for var in env_vars:
    if var not in os.environ:
        raise Exception(f"{var} environment variable not set.")

RDBASE = os.environ.get("RDBASE")
RDKIT_LIB_DIR = os.environ.get("RDKIT_LIB_DIR")

# Find necessary RDKit components
MyRDKit_FIND_COMPONENTS = ["GraphMol", "SmilesParse", "FileParsers", "Depictor"]
for component in MyRDKit_FIND_COMPONENTS:
    library_path = os.path.join(RDKIT_LIB_DIR, f"libRDKit{component}.so")
    if not os.path.isfile(library_path):
        raise Exception(f"Didn't find RDKit {component} library.")

PAPER_DIR = "paper"
PTXFLAGS = ["-Xcompiler", "-O2", "-arch", "sm_50", "-Xptxas", "-v"]

# Locate CUDA configuration
def locate_cuda():
    home = os.environ.get("CUDA_HOME")
    if not home:
        raise Exception("CUDA_HOME environment variable not set.")
    nvcc = os.path.join(home, "bin", "nvcc")
    cudaconfig = {
        "home": home,
        "nvcc": nvcc,
        "include": os.path.join(home, "include"),
        "lib64": os.path.join(home, "lib64"),
    for k, v in cudaconfig.items():
        if not os.path.exists(v):
            raise EnvironmentError(f"The CUDA {k} path could not be located in {v}")
    return cudaconfig

# Customize the compiler for nvcc
def customize_compiler_for_nvcc(self):
    default_compiler_so = self.compiler_so
    super_compile = self._compile

    def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
        if os.path.splitext(src)[1] == ".cu":
            self.set_executable("compiler_so", CUDA["nvcc"])
            postargs = extra_postargs["nvcc"]
            self.set_executable("compiler_so", default_compiler_so)
            postargs = extra_postargs["gcc"]

        super_compile(obj, src, ext, cc_args, postargs, pp_opts)
        self.compiler_so = default_compiler_so  # Reset after each compile

    self._compile = _compile

# Custom build extension to handle CUDA compilation
class CustomBuildExt(build_ext):
    def build_extensions(self):

CUDA = locate_cuda()

# Define the extension
ext = Extension(
        os.path.join("roshambo", "cpaper.pyx"),
        os.path.join(PAPER_DIR, ""),
        os.path.join(PAPER_DIR, ""),
        os.path.join(PAPER_DIR, ""),
        os.path.join(PAPER_DIR, ""),
        os.path.join(PAPER_DIR, "inputFileReader.cpp"),
        os.path.join(PAPER_DIR, "inputPreprocessor.cpp"),
        os.path.join(PAPER_DIR, ""),
    include_dirs=[CUDA["include"], PAPER_DIR, RDKIT_INCLUDE_DIR],
    library_dirs=[CUDA["lib64"], PAPER_DIR, RDKIT_LIB_DIR],
    runtime_library_dirs=[CUDA["lib64"], PAPER_DIR],
        "gcc": CCFLAGS + ["-std=c++17"],  # C++ flags for g++
        "nvcc": CCFLAGS + ["-std=c++17"] + PTXFLAGS + ["--ptxas-options=-v", "-c", "--compiler-options", "-fPIC"],  # CUDA flags for nvcc

# Load requirements.txt
with open("requirements.txt") as f:
    requirements =

# Setup configuration
    author="Rasha Atwi",
    description="roshambo is a python package for robust Gaussian molecular shape comparison",
        "console_scripts": [
            "roshambo = roshambo.cli:main",
        "Programming Language :: Python :: 3",
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Science/Research",
        "Intended Audience :: Information Technology",
        "Operating System :: OS Independent",
        "Topic :: Scientific/Engineering :: Chemistry",
        "Topic :: Software Development :: Libraries :: Python Modules",
    package_data={"roshambo": ["roshambo/*.cpython*.so"]},
    cmdclass={"build_ext": CustomBuildExt},

Annoyingly, you will still run into other issues, but these can be fixed by adding one line - #include <cuda_runtime.h> to the cudaVolumeTypes.h script in the paper folder - the script I used is shown below:

 * cudaVolumeTypes.h
 * Data structure definitions for PAPER
 * Author: Imran Haque, 2009
 * Copyright 2009, Stanford University
 * This file is licensed under the terms of the GPL. Please see
 * the COPYING file in the accompanying source distribution for
 * full license terms.

typedef unsigned int uint;

#include <stdbool.h>
#include <cuda_runtime.h>

// Use this section if we're compiling under g++ instead of nvcc
#ifdef GPP
#include <unistd.h>
typedef struct _float4 {
    float x,y,z,w;
} float4;
typedef struct _float3 {
    float x,y,z;
} float3;

typedef struct _hCUDAmol {
    float4* atoms;
    uint natoms;
} CUDAmol;
typedef struct _dCUDAmol {
    float* x;
    float* y;
    float* z;
    float* a;
    uint natoms;
} dCUDAmol;
typedef struct _dCUDAMultimol {
    float* mols;
    uint* atomcounts;
    uint* molids;
    uint maxatoms;
    size_t pitch;
    uint nmols;
    float* transforms;
    size_t transform_pitch;
    bool isDeviceMM;
} dCUDAMultimol;

After updating both of the scripts you should be able to compile :)

pip install -e .
Obtaining file:///home/michael/DD_tools/roshambo
  Preparing metadata ( ... done
Requirement already satisfied: numpy==1.21.6 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (1.21.6)
Requirement already satisfied: pandas==1.3.5 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (1.3.5)
Requirement already satisfied: cython==0.29.33 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (0.29.33)
Requirement already satisfied: matplotlib==3.5.3 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (3.5.3)
Requirement already satisfied: scikit-learn==1.0.2 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (1.0.2)
Requirement already satisfied: scipy==1.7.3 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (1.7.3)
Requirement already satisfied: Pillow==9.4.0 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (9.4.0)
Requirement already satisfied: cairosvg==2.7.0 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from roshambo==0.0.1) (2.7.0)
Requirement already satisfied: cairocffi in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cairosvg==2.7.0->roshambo==0.0.1) (1.7.1)
Requirement already satisfied: cssselect2 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cairosvg==2.7.0->roshambo==0.0.1) (0.7.0)
Requirement already satisfied: defusedxml in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cairosvg==2.7.0->roshambo==0.0.1) (0.7.1)
Requirement already satisfied: tinycss2 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cairosvg==2.7.0->roshambo==0.0.1) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from matplotlib==3.5.3->roshambo==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from matplotlib==3.5.3->roshambo==0.0.1) (4.54.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from matplotlib==3.5.3->roshambo==0.0.1) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from matplotlib==3.5.3->roshambo==0.0.1) (24.1)
Requirement already satisfied: pyparsing>=2.2.1 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from matplotlib==3.5.3->roshambo==0.0.1) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from matplotlib==3.5.3->roshambo==0.0.1) (2.9.0)
Requirement already satisfied: pytz>=2017.3 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from pandas==1.3.5->roshambo==0.0.1) (2024.1)
Requirement already satisfied: joblib>=0.11 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from scikit-learn==1.0.2->roshambo==0.0.1) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from scikit-learn==1.0.2->roshambo==0.0.1) (3.5.0)
Requirement already satisfied: six>=1.5 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib==3.5.3->roshambo==0.0.1) (1.16.0)
Requirement already satisfied: cffi>=1.1.0 in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cairocffi->cairosvg==2.7.0->roshambo==0.0.1) (1.17.1)
Requirement already satisfied: webencodings in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cssselect2->cairosvg==2.7.0->roshambo==0.0.1) (0.5.1)
Requirement already satisfied: pycparser in /home/michael/miniconda3/envs/roshambo/lib/python3.9/site-packages (from cffi>=1.1.0->cairocffi->cairosvg==2.7.0->roshambo==0.0.1) (2.22)
Installing collected packages: roshambo
  DEPRECATION: Legacy editable install of roshambo==0.0.1 from file:///home/michael/DD_tools/roshambo ( develop) is deprecated. pip 25.0 will enforce this behaviour change. A possible replacement is to add a pyproject.toml or enable --use-pep517, and use setuptools >= 64. If the resulting installation is not behaving as expected, try using --config-settings editable_mode=compat. Please consult the setuptools documentation for more information. Discussion can be found at
  Running develop for roshambo
Successfully installed roshambo

And just for sanity checking:

roshambo -h
usage: roshambo [-h] [--ignore_hs] [--n_confs N_CONFS] [--keep_mol] [--working_dir WORKING_DIR] [--name_prefix NAME_PREFIX] [--smiles_delimiter SMILES_DELIMITER]
                [--gpu_id GPU_ID] [--volume_type {analytic,gaussian}] [--n N] [--proxy_cutoff PROXY_CUTOFF] [--epsilon EPSILON] [--res RES] [--margin MARGIN]
                [--no_carbon_radii] [--color] [--fdef_path FDEF_PATH] [--sort_by SORT_BY] [--write_to_file] [--max_conformers MAX_CONFORMERS] [--filename FILENAME]
                [--random_seed RANDOM_SEED] [--method {ETDG,ETKDG,ETKDGv2,ETKDGv3}] [--ff {UFF,MMFF94s,MMFF94s_noEstat}] [--opt_confs] [--calc_energy]
                [--energy_iters ENERGY_ITERS] [--energy_cutoff ENERGY_CUTOFF] [--align_confs] [--rms_cutoff RMS_CUTOFF] [--num_threads NUM_THREADS]
                ref_file dataset_files_pattern

Get similarity scores between a reference molecule and a dataset of molecules.

positional arguments:
  ref_file              Name of the reference molecule file.
                        File pattern to match the dataset molecule files.

optional arguments:
  -h, --help            show this help message and exit
  --ignore_hs           Ignore hydrogens.
  --n_confs N_CONFS     Number of conformers to generate.
  --keep_mol            Keep the original molecule in addition to the conformers.
  --working_dir WORKING_DIR
                        Working directory.
  --name_prefix NAME_PREFIX
                        Prefix to use for the molecule names if not found in the input files.
  --smiles_delimiter SMILES_DELIMITER
                        Specify the delimiter for parsing SMILES. Use 'SPACE' for space, 'TAB' for tab, etc.
  --gpu_id GPU_ID       ID of the GPU to use for running PAPER.
  --volume_type {analytic,gaussian}
                        The type of overlap volume calculation to use.
  --n N                 The order of the analytic overlap volume calculation.
  --proxy_cutoff PROXY_CUTOFF
                        The distance cutoff to use for the atoms to be considered neighbors.
  --epsilon EPSILON     The Gaussian cutoff to use in the analytic volume calculation.
  --res RES             The grid resolution to use for the Gaussian volume calculation.
  --margin MARGIN       The margin to add to the grid box size for the Gaussian volume calculation.
  --no_carbon_radii     Disable the use of carbon radii for the overlap calculations.
  --color               Calculate color scores in addition to shape scores.
  --fdef_path FDEF_PATH
                        The file path to the feature definition file to use for the pharmacophore calculation.
  --sort_by SORT_BY     The score to sort the final results by.
  --write_to_file       Write the transformed molecules to a sdf file.
  --max_conformers MAX_CONFORMERS
                        The maximum number of conformers to write for each molecule.
  --filename FILENAME   The name of the output file to write.
  --random_seed RANDOM_SEED
                        Random seed for conformer generation.
  --method {ETDG,ETKDG,ETKDGv2,ETKDGv3}
                        The method to use for conformer generation (ETDG, ETKDG, or ETKDGv2)
  --ff {UFF,MMFF94s,MMFF94s_noEstat}
                        The force field to use for conformer generation (UFF, MMFF94s, or MMFF94s_noEstat)
  --opt_confs           Optimize the conformers.
  --calc_energy         Calculate the energy of the conformers.
  --energy_iters ENERGY_ITERS
                        Number of iterations for energy calculation.
  --energy_cutoff ENERGY_CUTOFF
                        Maximum energy difference (in kcal/mol) to keep a conformer after energy minimization.
  --align_confs         Align the conformers.
  --rms_cutoff RMS_CUTOFF
                        RMSD cutoff for conformer clustering.
  --num_threads NUM_THREADS
                        Number of threads to use for conformer generation.

I am using ubuntu 22.04 - let me know if this solves your dockerfile install issue.

All the best, Mike