rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.34k stars 889 forks source link

Initial startup difficulties #254

Closed mrocklin closed 5 years ago

mrocklin commented 6 years ago

I'm doing the following steps but having difficulty running tests. I suspect that my environment is slightly mis-configured

Install

  1. Install dependencies into a new conda environment

    conda install -n dask-gdf \
       -c numba -c conda-forge -c gpuopenanalytics/label/dev -c defaults \
       pygdf=0.1.0a3 dask distributed cudatoolkit
  2. Activate conda environment:

    source activate dask-gdf
  3. Clone dask_gdf repo:

    git clone https://github.com/gpuopenanalytics/dask_gdf
  4. Install from source:

    cd dask_gdf
    pip install .

Test output

```bash (dask-gdf) mrocklin@demouser-DGX-Station:~/dask_gdf$ py.test dask_gdf/tests/test_accessor.py::test_datetime_accessor_initialization[data0] ================================================= test session starts ================================================= platform linux -- Python 3.6.6, pytest-3.8.0, py-1.6.0, pluggy-0.7.1 rootdir: /home/mrocklin/dask_gdf, inifile: collected 1 item dask_gdf/tests/test_accessor.py F [100%] ====================================================== FAILURES ======================================================= ____________________________________ test_datetime_accessor_initialization[data0] _____________________________________ self = ptx = b'//\n// Generated by NVIDIA NVVM Compiler\n//\n// Compiler Build ID: CL-24330188\n// Cuda compilation tools, release ...o.s64 \t%rd12, %rd11, %rd6;\n\tadd.s64 \t%rd13, %rd12, %rd4;\n\tst.u64 \t[%rd13], %rd9;\n\nBB0_2:\n\tret;\n}\n\n\n\x00' name = '' def add_ptx(self, ptx, name=''): ptxbuf = c_char_p(ptx) namebuf = c_char_p(name.encode('utf8')) self._keep_alive += [ptxbuf, namebuf] try: driver.cuLinkAddData(self.handle, enums.CU_JIT_INPUT_PTX, > ptxbuf, len(ptx), namebuf, 0, None, None) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py:1565: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ args = (c_void_p(54921760), 1, c_char_p(55887328), 3510, c_char_p(140082594053584), 0, ...), retcode = 218 @functools.wraps(libfn) def safe_cuda_api_call(*args): _logger.debug('call driver api: %s', libfn.__name__) retcode = libfn(*args) > self._check_error(fname, retcode) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py:290: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , fname = 'cuLinkAddData', retcode = 218 def _check_error(self, fname, retcode): if retcode != enums.CUDA_SUCCESS: errname = ERROR_MAP.get(retcode, "UNKNOWN_CUDA_ERROR") msg = "Call to %s results in %s" % (fname, errname) _logger.error(msg) if retcode == enums.CUDA_ERROR_NOT_INITIALIZED: # Detect forking if self.pid is not None and _getpid() != self.pid: msg = 'pid %s forked from pid %s after CUDA driver init' _logger.critical(msg, _getpid(), self.pid) raise CudaDriverError("CUDA initialized before forking") > raise CudaAPIError(retcode, msg) E numba.cuda.cudadrv.driver.CudaAPIError: [218] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py:325: CudaAPIError During handling of the above exception, another exception occurred: data = array([ 0.847579 , 0.28504254, -3.11228798, 0.0998681 , 0.07316805, -0.80133748, 0.47261724, 1.62136736, ...581587, -0.40626823, -0.65608612, -0.07062451, 0.15300313, -1.33376729, -1.13794191, 0.45410824, -0.63739099]) @pytest.mark.parametrize('data', [data1()]) @pytest.mark.xfail(raises=AttributeError) def test_datetime_accessor_initialization(data): pd_data = pd.Series(data.copy()) gdf_data = Series(pd_data) > dask_gdf_data = dgd.from_pygdf(gdf_data, npartitions=5) dask_gdf/tests/test_accessor.py:25: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ dask_gdf/core.py:992: in from_pygdf data = data.sort_index(ascending=True) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/pygdf/series.py:502: in sort_index inds = self.index.argsort(ascending=ascending) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/pygdf/index.py:43: in argsort return self.as_column().argsort(ascending=ascending) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/pygdf/index.py:170: in as_column vals = cudautils.arange(self._start, self._stop, dtype=self.dtype) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/pygdf/cudautils.py:40: in arange gpu_arange.forall(size)(start, size, step, out) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/compiler.py:241: in __call__ kernel = self.kernel.specialize(*args) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/compiler.py:777: in specialize kernel = self.compile(argtypes) ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/compiler.py:795: in compile kernel.bind() ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/compiler.py:517: in bind self._func.get() ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/compiler.py:399: in get linker.add_ptx(ptx) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = ptx = b'//\n// Generated by NVIDIA NVVM Compiler\n//\n// Compiler Build ID: CL-24330188\n// Cuda compilation tools, release ...o.s64 \t%rd12, %rd11, %rd6;\n\tadd.s64 \t%rd13, %rd12, %rd4;\n\tst.u64 \t[%rd13], %rd9;\n\nBB0_2:\n\tret;\n}\n\n\n\x00' name = '' def add_ptx(self, ptx, name=''): ptxbuf = c_char_p(ptx) namebuf = c_char_p(name.encode('utf8')) self._keep_alive += [ptxbuf, namebuf] try: driver.cuLinkAddData(self.handle, enums.CU_JIT_INPUT_PTX, ptxbuf, len(ptx), namebuf, 0, None, None) except CudaAPIError as e: > raise LinkerError("%s\n%s" % (e, self.error_log)) E numba.cuda.cudadrv.driver.LinkerError: [218] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR E ptxas application ptx input, line 9; fatal : Unsupported .version 6.2; current version is '6.0' E ptxas fatal : Ptx assembly aborted due to errors ../miniconda/envs/dask-gdf/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py:1567: LinkerError -------------------------------------------------- Captured log call -------------------------------------------------- driver.py 318 ERROR Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR ============================================== 1 failed in 1.70 seconds =============================================== ```
kkraus14 commented 6 years ago

@mrocklin I assume this machine has an NVIDIA GPU in it? Could you run nvidia-smi and dump the output here?

mrocklin commented 6 years ago
(dask-gdf) mrocklin@demouser-DGX-Station:~/dask_gdf$ nvidia-smi
Tue Sep 25 15:10:08 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145                Driver Version: 384.145                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-DGXS...  On   | 00000000:07:00.0  On |                    0 |
| N/A   38C    P0    37W / 300W |     29MiB / 16149MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-DGXS...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   38C    P0    36W / 300W |     10MiB / 16149MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-DGXS...  On   | 00000000:0E:00.0 Off |                    0 |
| N/A   38C    P0    51W / 300W |  15322MiB / 16149MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-DGXS...  On   | 00000000:0F:00.0 Off |                    0 |
| N/A   38C    P0    36W / 300W |     10MiB / 16149MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1501      G   /usr/lib/xorg/Xorg                            18MiB |
|    2     13342      C   .../sseibert/miniconda3/envs/tf/bin/python 15312MiB |
+-----------------------------------------------------------------------------+
kkraus14 commented 6 years ago

@mrocklin and what version of cudatoolkit got pulled? It looks like the driver version is too old for the CUDA version that was grabbed via conda.

mrocklin commented 6 years ago

I've tried both 9.2-0 and 9.1-h85f986d_0 numba

kkraus14 commented 6 years ago

I would update your NVIDIA driver to 396. CUDA 9.1 requires driver 390.12 or newer. CUDA 9.2 requires driver 396.44 or newer.

mrocklin commented 6 years ago

Is that system wide or can that be handled in user space (I apologize for not having experience here). Is this something that can be handled by conda or is this deeper?

kkraus14 commented 6 years ago

This is system wide as it needs to load kernel modules unfortunately. Otherwise I believe CUDA 9.0 should work with that driver and that's userspace.

mrocklin commented 6 years ago

Yeah, downgrading cudatoolkit to 9.0 works for me.

sklam commented 6 years ago

We are coordinating with conda to implement a detection for cuda driver version to know what range of cudatoolkit to install.

kkraus14 commented 5 years ago

Closing as resolved and discussions related to this are ongoing in other issues.