rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.34k stars 889 forks source link

ModuleNotFoundError: No module named 'cudf' #9743

Closed anamariaUIC closed 2 years ago

anamariaUIC commented 2 years ago

What is your question? Hello,

I installed this library via: anamaria@login-2[SABER]: ~ $ module load Python/3.7.4-GCCcore-8.3.0 anamaria@login-2[SABER]: ~ $ module load CUDA/11.1.1-GCC-10.2.0

anamaria@login-2[SABER]: ~ $ conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.08 python=3.7 cudatoolkit=11.1

Installation finished and I got message: Your installed version is: 2.17

I started the same Python version (above) I used to install this and I got: anamaria@login-2[SABER]: ~ $ python Python 3.7.4 (default, Aug 10 2021, 17:30:40) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cudf Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'cudf' import os import numpy as np import math import pandas as pd; print('Pandas Version:', pd.version) Pandas Version: 1.1.3 import cudf; print('cuDF Version:', cudf.version) Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'cudf'

anamaria@login-2[SABER]: ~ $ module list

Currently Loaded Modules: 1) bzip2/1.0.8-GCCcore-8.3.0 5) SQLite/3.29.0-GCCcore-8.3.0 9) Python/3.7.4-GCCcore-8.3.0 13) GCC/10.2.0 2) ncurses/6.1-GCCcore-8.3.0 6) XZ/5.2.4-GCCcore-8.3.0 10) GCCcore/10.2.0 14) CUDAcore/11.1.1 3) libreadline/8.0-GCCcore-8.3.0 7) GMP/6.1.2-GCCcore-8.3.0 11) zlib/1.2.11-GCCcore-10.2.0 15) CUDA/11.1.1-GCC-10.2.0 4) Tcl/8.6.9-GCCcore-8.3.0 8) libffi/3.2.1-GCCcore-8.3.0 12) binutils/2.35-GCCcore-10.2.0

anamaria@login-2[SABER]: ~ $ cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Please advise

beckernick commented 2 years ago

@anamariaUIC is your computing system and environment set up to use conda install without creating an environment?

anamariaUIC commented 2 years ago

@beckernick I am not sure (it's a remote machine) but I will try to install it in env. In the meantime can you please tell me if all these versions of programs are compatible with each other?

module load Python/3.7.4-GCCcore-8.3.0 module load CUDA/11.1.1-GCC-10.2.0

conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.08 python=3.7 cudatoolkit=11.1

quasiben commented 2 years ago

21.08 doesn't have a cudatoolkit 11.1 builds. We only have 11.0 and 11.2 CUDA versions for 21.08. In the soon to be released 21.12 we will have CUDA Enhanced Compatibility with 11.1 support. You can test with the rapidsai nightly channel:

conda create -n rapids-21.12-11.1 -c rapidsai-nightly -c nvidia -c conda-forge cudf=21.12 cudatoolkit=11.1

anamariaUIC commented 2 years ago

@quasiben sounds good I will test it with: conda create -n rapids-21.12-11.1 -c rapidsai-nightly -c nvidia -c conda-forge cudf=21.12 cudatoolkit=11.1

Can I keep these two libraries for that installation?

module load Python/3.7.4-GCCcore-8.3.0 module load CUDA/11.1.1-GCC-10.2.0

On this system I only have these two CUDA versions installed:

CUDA/9.2.88-GCC-7.3.0-2.30 CUDA/11.1.1-GCC-10.2.0

anamariaUIC commented 2 years ago

@quasiben This is what I tried: module load Python/3.7.4-GCCcore-8.3.0 module load CUDA/11.1.1-GCC-10.2.0

conda create --name rapids

conda activate rapids

conda create -n rapids-21.12-11.1 -c rapidsai-nightly -c nvidia -c conda-forge cudf=21.12 cudatoolkit=11.1

installation finished, no errors reported

(rapids) anamaria@login-2[SABER]: ~ $ conda activate rapids-21.12-11.1 (rapids-21.12-11.1) anamaria@login-2[SABER]: ~ $ python Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cudf Traceback (most recent call last): File "", line 1, in File "/home/anamaria/.conda/envs/rapids-21.12-11.1/lib/python3.8/site-packages/cudf/init.py", line 4, in validate_setup() File "/home/anamaria/.conda/envs/rapids-21.12-11.1/lib/python3.8/site-packages/cudf/utils/gpu_utils.py", line 18, in validate_setup from rmm._cuda.gpu import ( File "/home/anamaria/.conda/envs/rapids-21.12-11.1/lib/python3.8/site-packages/rmm/init.py", line 16, in from rmm import mr File "/home/anamaria/.conda/envs/rapids-21.12-11.1/lib/python3.8/site-packages/rmm/mr.py", line 14, in from rmm._lib.memory_resource import ( File "/home/anamaria/.conda/envs/rapids-21.12-11.1/lib/python3.8/site-packages/rmm/_lib/init.py", line 15, in from .device_buffer import DeviceBuffer ImportError: libcuda.so.1: cannot open shared object file: No such file or directory quit() (rapids-21.12-11.1) anamaria@login-2[SABER]: ~ $ module list

Currently Loaded Modules: 1) bzip2/1.0.8-GCCcore-8.3.0 5) SQLite/3.29.0-GCCcore-8.3.0 9) Python/3.7.4-GCCcore-8.3.0 13) GCC/10.2.0 2) ncurses/6.1-GCCcore-8.3.0 6) XZ/5.2.4-GCCcore-8.3.0 10) GCCcore/10.2.0 14) CUDAcore/11.1.1 3) libreadline/8.0-GCCcore-8.3.0 7) GMP/6.1.2-GCCcore-8.3.0 11) zlib/1.2.11-GCCcore-10.2.0 15) CUDA/11.1.1-GCC-10.2.0 4) Tcl/8.6.9-GCCcore-8.3.0 8) libffi/3.2.1-GCCcore-8.3.0 12) binutils/2.35-GCCcore-10.2.0

libcuda.so.1 is on my path: (rapids-21.12-11.1) anamaria@login-2[SABER]: ~ $ echo $LIBRARY_PATH /software/linux-el7-x86_64/tools/EasyBuild-4.1.0/software/CUDAcore/11.1.1/stubs/lib64

Please advise

anamariaUIC commented 2 years ago

@quasiben @beckernick

Can you please tell me what is going on here? I installed it with recommended CUDA 11.0 version, but cudf can't be found, please find bellow all steps. Installetion was performed on a GPU node.

anamaria@gpu-2-0.saber:~ $ conda activate rapids (rapids) anamaria@gpu-2-0.saber:~ $ module load CUDA/11.0.2-GCC-9.3.0 (rapids) anamaria@gpu-2-0.saber:~ $ module load Python/3.8.2-GCCcore-9.3.0 (rapids) anamaria@gpu-2-0.saber:~ $ conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.08 python=3.7 cudatoolkit=11.0 Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: - Warning: 2 possible package resolutions (only showing differing packages):

==> WARNING: A newer version of conda exists. <== current version: 4.10.1 latest version: 4.11.0

Please update conda by running

$ conda update -n base conda

Package Plan

environment location: /home/anamaria/.conda/envs/rapids

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
numba-0.54.0rc1            |np1.11py3.7h04863e7_g9bed2ebb2_0         3.6 MB  numba
numpy-1.21.4               |   py37h31617e3_0         6.1 MB  conda-forge
------------------------------------------------------------
                                       Total:         9.7 MB

The following NEW packages will be INSTALLED:

_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu abseil-cpp conda-forge/linux-64::abseil-cpp-20210324.2-h9c3ff4c_0 arrow-cpp conda-forge/linux-64::arrow-cpp-4.0.1-py37h4f4072c_22_cuda arrow-cpp-proc conda-forge/linux-64::arrow-cpp-proc-3.0.0-cuda aws-c-auth conda-forge/linux-64::aws-c-auth-0.6.7-hfef2836_0 aws-c-cal conda-forge/linux-64::aws-c-cal-0.5.12-h70efedd_7 aws-c-common conda-forge/linux-64::aws-c-common-0.6.17-h7f98852_0 aws-c-compression conda-forge/linux-64::aws-c-compression-0.2.14-h7c7754b_7 aws-c-event-stream conda-forge/linux-64::aws-c-event-stream-0.2.7-hb80ed28_31 aws-c-http conda-forge/linux-64::aws-c-http-0.6.10-h58a30cf_2 aws-c-io conda-forge/linux-64::aws-c-io-0.10.13-he836878_5 aws-c-mqtt conda-forge/linux-64::aws-c-mqtt-0.7.9-h042a236_0 aws-c-s3 conda-forge/linux-64::aws-c-s3-0.1.27-hae5f17b_11 aws-c-sdkutils conda-forge/linux-64::aws-c-sdkutils-0.1.1-h7c7754b_4 aws-checksums conda-forge/linux-64::aws-checksums-0.1.12-h7c7754b_6 aws-crt-cpp conda-forge/linux-64::aws-crt-cpp-0.17.8-h82bac0c_1 aws-sdk-cpp conda-forge/linux-64::aws-sdk-cpp-1.9.148-hfe59705_0 bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4 c-ares conda-forge/linux-64::c-ares-1.18.1-h7f98852_0 ca-certificates conda-forge/linux-64::ca-certificates-2021.10.8-ha878542_0 cachetools conda-forge/noarch::cachetools-4.2.4-pyhd8ed1ab_0 cudatoolkit nvidia/linux-64::cudatoolkit-11.0.221-h6bb024c_0 cudf rapidsai/linux-64::cudf-21.08.03-cuda_11.0_py37_ge4313b6a1e_0 cudnn nvidia/linux-64::cudnn-8.0.0-cuda11.0_0 cupy rapidsai/linux-64::cupy-8.0.0-py37h0ce7dbb_0 dlpack conda-forge/linux-64::dlpack-0.5-h9c3ff4c_0 fastavro conda-forge/linux-64::fastavro-1.4.7-py37h5e8e339_1 fastrlock conda-forge/linux-64::fastrlock-0.8-py37hcd2ae1e_1 fsspec conda-forge/noarch::fsspec-2021.11.0-pyhd8ed1ab_0 gflags conda-forge/linux-64::gflags-2.2.2-he1b5a44_1004 grpc-cpp conda-forge/linux-64::grpc-cpp-1.42.0-h7e358d9_0 krb5 conda-forge/linux-64::krb5-1.19.2-hcc1bbae_3 ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.36.1-hea4e1c9_2 libblas conda-forge/linux-64::libblas-3.9.0-12_linux64_openblas libbrotlicommon conda-forge/linux-64::libbrotlicommon-1.0.9-h7f98852_6 libbrotlidec conda-forge/linux-64::libbrotlidec-1.0.9-h7f98852_6 libbrotlienc conda-forge/linux-64::libbrotlienc-1.0.9-h7f98852_6 libcblas conda-forge/linux-64::libcblas-3.9.0-12_linux64_openblas libcudf rapidsai/linux-64::libcudf-21.08.03-cuda11.0_ge4313b6a1e_0 libcurl conda-forge/linux-64::libcurl-7.80.0-h2574ce0_0 libedit conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2 libev conda-forge/linux-64::libev-4.33-h516909a_1 libevent conda-forge/linux-64::libevent-2.1.10-h9b69904_4 libffi conda-forge/linux-64::libffi-3.4.2-h7f98852_5 libgcc-ng conda-forge/linux-64::libgcc-ng-11.2.0-h1d223b6_11 libgfortran-ng conda-forge/linux-64::libgfortran-ng-11.2.0-h69a702a_11 libgfortran5 conda-forge/linux-64::libgfortran5-11.2.0-h5c6108e_11 libgomp conda-forge/linux-64::libgomp-11.2.0-h1d223b6_11 liblapack conda-forge/linux-64::liblapack-3.9.0-12_linux64_openblas libnghttp2 conda-forge/linux-64::libnghttp2-1.43.0-h812cca2_1 libnsl conda-forge/linux-64::libnsl-2.0.0-h7f98852_0 libopenblas conda-forge/linux-64::libopenblas-0.3.18-pthreads_h8fe5266_0 libprotobuf conda-forge/linux-64::libprotobuf-3.18.1-h780b84a_0 librmm rapidsai/linux-64::librmm-21.08.02-cuda11.0_g115bad2_0 libssh2 conda-forge/linux-64::libssh2-1.10.0-ha56f1ee_2 libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-11.2.0-he4da1e4_11 libthrift conda-forge/linux-64::libthrift-0.15.0-he6d91bd_1 libutf8proc conda-forge/linux-64::libutf8proc-2.6.1-h7f98852_0 libzlib conda-forge/linux-64::libzlib-1.2.11-h36c2ea0_1013 llvmlite numba/linux-64::llvmlite-0.37.0-py37he1b5a44_0 lz4-c conda-forge/linux-64::lz4-c-1.9.3-h9c3ff4c_1 nccl nvidia/linux-64::nccl-2.7.8.1-h4962215_100 ncurses conda-forge/linux-64::ncurses-6.2-h58526e2_4 numba numba/linux-64::numba-0.54.0rc1-np1.11py3.7h04863e7_g9bed2ebb2_0 numpy conda-forge/linux-64::numpy-1.21.4-py37h31617e3_0 nvtx conda-forge/linux-64::nvtx-0.2.3-py37h5e8e339_1 openssl conda-forge/linux-64::openssl-1.1.1l-h7f98852_0 orc conda-forge/linux-64::orc-1.7.1-h68e2c4e_0 packaging conda-forge/noarch::packaging-21.3-pyhd8ed1ab_0 pandas conda-forge/linux-64::pandas-1.2.5-py37h219a48f_0 parquet-cpp conda-forge/noarch::parquet-cpp-1.5.1-2 pip conda-forge/noarch::pip-21.3.1-pyhd8ed1ab_0 protobuf conda-forge/linux-64::protobuf-3.18.1-py37hcd2ae1e_0 pyarrow conda-forge/linux-64::pyarrow-4.0.1-py37h63cede7_22_cuda pyparsing conda-forge/noarch::pyparsing-3.0.6-pyhd8ed1ab_0 python conda-forge/linux-64::python-3.7.12-hb7a2778_100_cpython python-dateutil conda-forge/noarch::python-dateutil-2.8.2-pyhd8ed1ab_0 python_abi conda-forge/linux-64::python_abi-3.7-2_cp37m pytz conda-forge/noarch::pytz-2021.3-pyhd8ed1ab_0 re2 conda-forge/linux-64::re2-2021.11.01-h9c3ff4c_0 readline conda-forge/linux-64::readline-8.1-h46c0cb4_0 rmm rapidsai/linux-64::rmm-21.08.02-cuda_11.0_py37_g115bad2_0 s2n conda-forge/linux-64::s2n-1.3.0-h9b69904_0 setuptools conda-forge/linux-64::setuptools-59.2.0-py37h89c1867_0 six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0 snappy conda-forge/linux-64::snappy-1.1.8-he1b5a44_3 spdlog conda-forge/linux-64::spdlog-1.8.5-h4bd325d_0 sqlite conda-forge/linux-64::sqlite-3.36.0-h9cd32fc_2 tk conda-forge/linux-64::tk-8.6.11-h27826a3_1 typing_extensions conda-forge/noarch::typing_extensions-4.0.0-pyha770c72_0 wheel conda-forge/noarch::wheel-0.37.0-pyhd8ed1ab_1 xz conda-forge/linux-64::xz-5.2.5-h516909a_1 zlib conda-forge/linux-64::zlib-1.2.11-h36c2ea0_1013 zstd conda-forge/linux-64::zstd-1.5.0-ha95c52a_0

Proceed ([y]/n)? y

Downloading and Extracting Packages numba-0.54.0rc1 | 3.6 MB | ########################################################################################################### | 100% numpy-1.21.4 | 6.1 MB | ########################################################################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: | By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

done (rapids) anamaria@gpu-2-0.saber:~ $ python Python 3.8.2 (default, Nov 22 2021, 18:38:33) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cudf Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'cudf'

beckernick commented 2 years ago

Are you able to use other GPU libraries in this conda environment, such as CuPy (or PyTorch if you install that)?

anamariaUIC commented 2 years ago

@beckernick @quasiben

In the same ENV I tried this. Please let me know what I am doing wrong? For CuPy I followed this instruction for my Cuda 11.0 https://docs.cupy.dev/en/stable/install.html

(rapids) anamaria@gpu-2-0.saber:~ $ pip install cupy-cuda110 Defaulting to user installation because normal site-packages is not writeable Collecting cupy-cuda110 Downloading cupy_cuda110-9.6.0-cp38-cp38-manylinux1_x86_64.whl (78.5 MB) |████████████████████████████████| 78.5 MB 17.5 MB/s Collecting fastrlock>=0.5 Downloading fastrlock-0.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (49 kB) |████████████████████████████████| 49 kB 119 kB/s Requirement already satisfied: numpy<1.24,>=1.17 in ./.local/lib/python3.8/site-packages (from cupy-cuda110) (1.18.5) Installing collected packages: fastrlock, cupy-cuda110 Successfully installed cupy-cuda110-9.6.0 fastrlock-0.8 WARNING: You are using pip version 20.0.2; however, version 21.3.1 is available. You should consider upgrading via the '/software/linux-el7-x86_64/tools/EasyBuild-4.1.0/software/Python/3.8.2-GCCcore-9.3.0/bin/python3.8 -m pip install --upgrade pip' command. (rapids) anamaria@gpu-2-0.saber:~ $ python Python 3.8.2 (default, Nov 22 2021, 18:38:33) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cupy as cp Traceback (most recent call last): File "/home/anamaria/.local/lib/python3.8/site-packages/cupy/init.py", line 16, in from cupy import _core # NOQA File "/home/anamaria/.local/lib/python3.8/site-packages/cupy/_core/init.py", line 1, in from cupy._core import core # NOQA File "cupy/_core/core.pyx", line 1, in init cupy._core.core File "/home/anamaria/.local/lib/python3.8/site-packages/cupy/cuda/init.py", line 8, in from cupy.cuda import compiler # NOQA File "/home/anamaria/.local/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 12, in from cupy.cuda import function File "cupy/cuda/function.pyx", line 1, in init cupy.cuda.function File "cupy/cuda/texture.pyx", line 1, in init cupy.cuda.texture ImportError: /home/anamaria/.local/lib/python3.8/site-packages/cupy_backends/cuda/api/driver.cpython-38-x86_64-linux-gnu.so: undefined symbol: cuDevicePrimaryCtxRelease_v2

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 1, in File "/home/anamaria/.local/lib/python3.8/site-packages/cupy/init.py", line 37, in raise ImportError(_msg) from e ImportError: CuPy is not correctly installed.

If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host. Also, confirm that only one CuPy package is installed: $ pip freeze

If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with: $ pip install cupy --no-cache-dir -vvvv

Check the Installation Guide for details: https://docs.cupy.dev/en/latest/install.html

original error: /home/anamaria/.local/lib/python3.8/site-packages/cupy_backends/cuda/api/driver.cpython-38-x86_64-linux-gnu.so: undefined symbol: cuDevicePrimaryCtxRelease_v2

quit() (rapids) anamaria@gpu-2-0.saber:~ $ module list

Currently Loaded Modules: 1) GCCcore/9.3.0 5) CUDAcore/11.0.2 9) Tcl/8.6.10-GCCcore-9.3.0 13) libffi/3.3-GCCcore-9.3.0 2) zlib/1.2.11-GCCcore-9.3.0 6) bzip2/1.0.8-GCCcore-9.3.0 10) SQLite/3.31.1-GCCcore-9.3.0 14) CUDA/11.0.2-GCC-9.3.0 3) binutils/2.34-GCCcore-9.3.0 7) ncurses/6.2-GCCcore-9.3.0 11) XZ/5.2.5-GCCcore-9.3.0 15) Python/3.8.2-GCCcore-9.3.0 4) GCC/9.3.0 8) libreadline/8.0-GCCcore-9.3.0 12) GMP/6.2.0-GCCcore-9.3.0

(rapids) anamaria@gpu-2-0.saber:~ $

A bit more information about machine I am trying to install this on:

(rapids) anamaria@gpu-2-0.saber:~ $ nvidia-smi Tue Nov 23 13:37:53 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.44 Driver Version: 396.44 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 | | N/A 34C P0 37W / 250W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================|

Can you please give me command that can install me cudf given any of these CUDA versions I have on system:

CUDA/11.0.2-GCC-9.3.0
CUDA/11.1.1-GCC-10.2.0

and given that my NVIDIA driver is: 396.44

anamariaUIC commented 2 years ago

@quasiben @beckernick Can you please tell me if there are any updates on this issue?

beckernick commented 2 years ago

Given CuPy also throws an error, it's possible the configuration/setup you're doing is not sufficient to use GPU libraries on your system. Are there other GPU analytics libraries that you can confirm work?

anamariaUIC commented 2 years ago

@beckernick I am n ot sure which other libraries I can try. This is what I have on my system, can you please tell me how to install any rapids version on it?

logged in in one GPU node

$ssh gpu-2-0 anamaria@gpu-2-0.saber:~ $ nvidia-smi Tue Dec 7 11:11:23 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.44 Driver Version: 396.44 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 | | N/A 33C P0 37W / 250W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

anamaria@gpu-2-0.saber:~ $ conda activate rapids (rapids) anamaria@gpu-2-0.saber:~ $ module list

Currently Loaded Modules: 1) gcccuda/2018b 8) OpenBLAS/0.3.1-GCC-7.3.0-2.30 15) libreadline/7.0-GCCcore-7.3.0 22) zlib/1.2.11-GCCcore-9.3.0 2) numactl/2.0.11-GCCcore-7.3.0 9) gompic/2018b 16) Tcl/8.6.8-GCCcore-7.3.0 23) binutils/2.34-GCCcore-9.3.0 3) XZ/5.2.4-GCCcore-7.3.0 10) FFTW/3.3.8-gompic-2018b 17) SQLite/3.24.0-GCCcore-7.3.0 24) GCC/9.3.0 4) libxml2/2.9.8-GCCcore-7.3.0 11) ScaLAPACK/2.0.2-gompic-2018b-OpenBLAS-0.3.1 18) GMP/6.1.2-GCCcore-7.3.0 25) CUDAcore/11.0.2 5) libpciaccess/0.14-GCCcore-7.3.0 12) fosscuda/2018b 19) libffi/3.2.1-GCCcore-7.3.0 26) CUDA/11.0.2-GCC-9.3.0 6) hwloc/1.11.10-GCCcore-7.3.0 13) bzip2/1.0.6-GCCcore-7.3.0 20) Python/3.6.6-fosscuda-2018b 7) OpenMPI/3.1.1-gcccuda-2018b 14) ncurses/6.1-GCCcore-7.3.0 21) GCCcore/9.3.0

(rapids) anamaria@gpu-2-0.saber:~ $ python Python 3.6.13 | packaged by conda-forge | (default, Sep 23 2021, 07:56:31) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cudf Traceback (most recent call last): File "", line 1, in File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/init.py", line 4, in from cudf import core, datasets File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/core/init.py", line 2, in from cudf.core import buffer, column File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/core/column/init.py", line 1, in from cudf.core.column.categorical import CategoricalColumn # noqa: F401 File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/core/column/categorical.py", line 9, in import cudf._lib as libcudf File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/_lib/init.py", line 1, in from . import ( File "cudf/_lib/rolling.pyx", line 12, in init cudf._lib.rolling ModuleNotFoundError: No module named 'numba.numpy_support' quit()

anamariaUIC commented 2 years ago

@beckernick To answer your previous question: I am able to install and run Pytorch as described here on the same GPU node I was trying to install RAPIDS...

(rapids) anamaria@gpu-2-0.saber:~ $ python Python 3.6.13 | packaged by conda-forge | (default, Sep 23 2021, 07:56:31) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch x = torch.rand(5, 3) print(x) tensor([[0.9252, 0.6244, 0.1674], [0.3979, 0.2963, 0.5217], [0.5410, 0.4638, 0.8860], [0.7705, 0.7522, 0.4433], [0.1960, 0.6363, 0.0396]])

beckernick commented 2 years ago

Are you able to manipulate GPU tensors?

anamariaUIC commented 2 years ago

@beckernick can you please share some code I should run in order to determine that?

anamariaUIC commented 2 years ago

@beckernick I am not sure if using CUDA tensors is relevant for your question:

(rapids) anamaria@gpu-2-0.saber:~ $ python Python 3.6.13 | packaged by conda-forge | (default, Sep 23 2021, 07:56:31) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch a = torch.full((10,), 3, device=torch.device("cuda")) Traceback (most recent call last): File "", line 1, in File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/torch/cuda/init.py", line 196, in _lazy_init _check_driver() File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/torch/cuda/init.py", line 110, in _check_driver of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion()))) AssertionError: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

(rapids) anamaria@gpu-2-0.saber:~ $ nvidia-smi Tue Dec 7 15:24:11 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.44 Driver Version: 396.44 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 | | N/A 33C P0 37W / 250W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Is my CUDA driver too old to use with any RAPIDS version?

Here are al details about my GPU:

(rapids) anamaria@gpu-2-0.saber:~/samples/1_Utilities/deviceQuery $ ./deviceQuery ./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla V100-PCIE-16GB" CUDA Driver Version / Runtime Version 9.2 / 9.2 CUDA Capability Major/Minor version number: 7.0 Total amount of global memory: 16160 MBytes (16945512448 bytes) (80) Multiprocessors, ( 64) CUDA Cores/MP: 5120 CUDA Cores GPU Max Clock rate: 1380 MHz (1.38 GHz) Memory Clock rate: 877 Mhz Memory Bus Width: 4096-bit L2 Cache Size: 6291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 7 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 59 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1

beckernick commented 2 years ago

Thanks for highlighting the driver. Yes, your driver appears to be too old to run RAPIDS libraries based on https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions

I believe this would be too old for PyTorch as well (but am not an expert).

anamariaUIC commented 2 years ago

@beckernick Ok, so my current Driver Version: 396.44. In order to install and run any version of RAPIDS I need at least driver version >= 450 for CUDA 11. Please confirm or correct?

If I can install older version of RAPIDS with my current driver version please send me that command

beckernick commented 2 years ago

You will need at least 450.80.02 for the current version of RAPIDS which is officially supported on CUDA 11+. Earlier versions of RAPIDS officially supported CUDA 10, which also has driver version requirements >= your current version based on the link above.

If possible, the best path forward for using recent versions of cuDF and other GPU analytics libraries would be upgrade the driver.

anamariaUIC commented 2 years ago

@beckernick per your recommendation I did update driver to version 495.

Next I started installation with: (rapids) anamaria@gpu-2-0.saber:~ $ module list

Currently Loaded Modules: 1) GCC/9.3.0 5) zlib/1.2.11-GCCcore-8.3.0 9) libreadline/8.0-GCCcore-8.3.0 13) GMP/6.1.2-GCCcore-8.3.0 2) CUDAcore/11.0.2 6) binutils/2.32-GCCcore-8.3.0 10) Tcl/8.6.9-GCCcore-8.3.0 14) libffi/3.2.1-GCCcore-8.3.0 3) CUDA/11.0.2-GCC-9.3.0 7) bzip2/1.0.8-GCCcore-8.3.0 11) SQLite/3.29.0-GCCcore-8.3.0 15) Python/3.7.4-GCCcore-8.3.0 4) GCCcore/8.3.0 8) ncurses/6.1-GCCcore-8.3.0 12) XZ/5.2.4-GCCcore-8.3.0

(rapids) anamaria@gpu-2-0.saber:~ $ conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.08 python=3.7 cudatoolkit=11.0 Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: | Found conflicts! Looking for incompatible packages. This can take several minutes. Press CTRL-C to abort. Examining conflict for ninja rmm pip libtiff cudf python_abi cython fastavro thrift-cpp pytorch numpy python six zstd libcudf pandas|

The installation has been running for more than 2.5 hours. Is that to be expected. Can you please check if my python version and my CUDA version are appropriate? Should I be running this installation any other way?

anamariaUIC commented 2 years ago

@beckernick The installation completed via: conda activate rapids module load Python/3.7.4-GCCcore-8.3.0

conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.08 python=3.7 cudatoolkit=11.0

and this is happening:

(base) anamaria@login-1[SABER]: ~ $ conda activate rapids (rapids) anamaria@login-1[SABER]: ~ $ module load Python/3.7.4-GCCcore-8.3.0 (rapids) anamaria@login-1[SABER]: ~ $ python Python 3.7.4 (default, Aug 10 2021, 17:30:40) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cudf Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'cudf' exit Use exit() or Ctrl-D (i.e. EOF) to exit quit() (rapids) anamaria@login-1[SABER]: ~ $ module list

Currently Loaded Modules: 1) GCCcore/8.3.0 4) bzip2/1.0.8-GCCcore-8.3.0 7) Tcl/8.6.9-GCCcore-8.3.0 10) GMP/6.1.2-GCCcore-8.3.0 2) zlib/1.2.11-GCCcore-8.3.0 5) ncurses/6.1-GCCcore-8.3.0 8) SQLite/3.29.0-GCCcore-8.3.0 11) libffi/3.2.1-GCCcore-8.3.0 3) binutils/2.32-GCCcore-8.3.0 6) libreadline/8.0-GCCcore-8.3.0 9) XZ/5.2.4-GCCcore-8.3.0 12) Python/3.7.4-GCCcore-8.3.0

(rapids) anamaria@login-1[SABER]: ~ $ module unload Python/3.7.4-GCCcore-8.3.0 (rapids) anamaria@login-1[SABER]: ~ $ python Python 3.6.13 | packaged by conda-forge | (default, Sep 23 2021, 07:56:31) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cudf Traceback (most recent call last): File "", line 1, in File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/init.py", line 4, in from cudf import core, datasets File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/core/init.py", line 2, in from cudf.core import buffer, column File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/cudf/core/buffer.py", line 6, in import rmm File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/rmm/init.py", line 18, in from rmm.rmm import ( File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/rmm/rmm.py", line 20, in import rmm._lib as librmm File "/home/anamaria/.conda/envs/rapids/lib/python3.6/site-packages/rmm/_lib/init.py", line 3, in from .lib import * ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Can you please advise on how to install this?

beckernick commented 2 years ago

In the first example, it looks like you may be using a Python (and perhaps site packages) different than the one that should be installed in you conda environment.

In the second example, it looks like the Python version from conda-forge you're actually using is 3.6. despite your conda command specifying 3.7. This potentially suggests the environment you're actually using may not be what you expect

Perhaps @jakirkham may have some insight, but I would otherwise recommend looking into how your system and conda environments interact.

anamariaUIC commented 2 years ago

@beckernick @jakirkham can you please explain how to install this? from your instructions it seems that I have to have Python 3.7 in my environment when I am installing this, therefore I do: conda activate rapids module load Python/3.7.4-GCCcore-8.3.0

conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.08 python=3.7 cudatoolkit=11.0

if I just activate conda ENV without loading any packages I have: (rapids) anamaria@login-1[SABER]: ~ $ module list

Currently Loaded Modules: 1) GCCcore/8.3.0 4) bzip2/1.0.8-GCCcore-8.3.0 7) Tcl/8.6.9-GCCcore-8.3.0 10) GMP/6.1.2-GCCcore-8.3.0 2) zlib/1.2.11-GCCcore-8.3.0 5) ncurses/6.1-GCCcore-8.3.0 8) SQLite/3.29.0-GCCcore-8.3.0 11) libffi/3.2.1-GCCcore-8.3.0 3) binutils/2.32-GCCcore-8.3.0 6) libreadline/8.0-GCCcore-8.3.0 9) XZ/5.2.4-GCCcore-8.3.0

if I load my Python 3.7 I have: (rapids) anamaria@login-1[SABER]: ~ $ module load Python/3.7.4-GCCcore-8.3.0 (rapids) anamaria@login-1[SABER]: ~ $ module list

Currently Loaded Modules: 1) GCCcore/8.3.0 4) bzip2/1.0.8-GCCcore-8.3.0 7) Tcl/8.6.9-GCCcore-8.3.0 10) GMP/6.1.2-GCCcore-8.3.0 2) zlib/1.2.11-GCCcore-8.3.0 5) ncurses/6.1-GCCcore-8.3.0 8) SQLite/3.29.0-GCCcore-8.3.0 11) libffi/3.2.1-GCCcore-8.3.0 3) binutils/2.32-GCCcore-8.3.0 6) libreadline/8.0-GCCcore-8.3.0 9) XZ/5.2.4-GCCcore-8.3.0 12) Python/3.7.4-GCCcore-8.3.0

anamariaUIC commented 2 years ago

cudf was installed doing the following:

conda clean --all

module load Python/3.7.4-GCCcore-8.3.0 module load CUDA/11.0.2-GCC-9.3.0

conda create -n rapids-21.12 -c rapidsai -c nvidia -c conda-forge rapids=21.12 python=3.7 cudatoolkit=11.0 dask-sql

conda activate rapids-21.12

export PATH=/home/anamaria/.conda/envs/rapids-21.12/bin:$PATH

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

vyasr commented 2 years ago

I'm afraid that the problems encountered here are a bit beyond our direct ability to diagnose. The modules that you're referring to are cluster-specific and managed by your administrator. As such, we do not know what is actually contained in the currently loaded modules that you've listed (such as GCCcore.* or CUDA.*). We could potentially guess at some of it, but it's more confusing because you also appear to be using conda. Does conda come from the Python/3.7.4-GCCcore-8.3.0 module, or did you install it separately? If you installed it separately, then you probably should be avoiding the module Python altogether since loading the module after using conda could be causing further problems. If you load the Python module and use a conda that isn't related to that Python module, then you have multiple versions of Python on your system, multiple locations where packages can be installed, and a whole host of other issues that can result.

The closest that you seem to have come to success is seeing the ImportError: libcuda.so.1: cannot open shared object file: No such file or directory when the module was unloaded. I would try to follow the same set of operations that got you to that point, but without ever loading modules except for a CUDA module.

vyasr commented 2 years ago

I'm going to close this for now as beyond our control to address. However, if the module-related issues can be resolved and there are till other problems please feel free to reopen.