rapidsai-community / notebooks-contrib

RAPIDS Community Notebooks
Apache License 2.0
512 stars 266 forks source link

[BUG] 404 not found dsql_vs_pyspark_netflow.ipynb #375

Closed nyck33 closed 1 year ago

nyck33 commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

--2023-05-14 15:11:00--  https://blazingsql-colab.s3.amazonaws.com/netflow_data/nf-chunk2.csv
Resolving blazingsql-colab.s3.amazonaws.com (blazingsql-colab.s3.amazonaws.com)... 3.5.28.158, 52.217.132.129, 52.216.217.105, ...
Connecting to blazingsql-colab.s3.amazonaws.com (blazingsql-colab.s3.amazonaws.com)|3.5.28.158|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-05-14 15:11:01 ERROR 404: Not Found.

Is this data no longer available for this notebook?

Steps/Code to reproduce bug Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

I made a conda env using selector as advised. Then connect to it from VSCode, use the conda env as the jupyter kernel and try to run cells.

Expected behavior A clear and concise description of what you expected to happen.

I wish I could download it so I could see the difference with pyspark. I was debating this all morning and thought, wow, what a perfect notebook but of course, there's a catch. Pretty disappointing how the entire rapids ecosystem, you are not maintaining the notebooks as well as a rich company like Nvidia should be. I'm talking about the Boston House Prices not being replaced. And now this. I have a job interview on Tuesday so hugely disappointing. Now I'll just have to do a Dask course on its own and maybe your other notebooks that run off the Docker container but that container does not have this repo. (cloned it but not showing up)

Environment details (please complete the following information):

docker pull nvcr.io/nvidia/rapidsai/rapidsai:23.04-cuda11.8-runtime-ubuntu22.04-py3.10
docker run --gpus all --rm -it \
    --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
    -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    nvcr.io/nvidia/rapidsai/rapidsai:23.04-cuda11.8-runtime-ubuntu22.04-py3.10

Additional context Add any other context about the problem here.

Please do an audit through the rapids ecosystem and update all notebooks so new users can actually get a feel for what this can do.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.47                 Driver Version: 531.68       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1650         On | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8                2W /  N/A|    433MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                 N/A      |
|    0   N/A  N/A        32      G   /Xwayland                                 N/A      |
|    0   N/A  N/A       485      G   /Xwayland                                 N/A      |
|    0   N/A  N/A      5215      C   /python3.10                               N/A      |
+---------------------------------------------------------------------------------------+
(rapids-23.04) nobu@LAPTOP-DNCQ5AAC:/mnt/d/cuda/rapids/notebooks-contrib$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Those are output from my WSL bash terminal. I have no idea why it's showing Cuda 12.1, which is installed on my WSL, but it should be showing Cuda 11.X from inside the conda env no? That's what the env has right because it's the one created from the command copied from your selector.

beckernick commented 1 year ago

cc @taureandyernv

taureandyernv commented 1 year ago

We recently found out that blazingsql digital assets have been shut down (including the website). We will need to find new data for this notebook, as per #376 . Thanks for letting us know @nyck33 !

nyck33 commented 1 year ago

@taureandyernv I replaced it with this: https://github.com/zenUnicorn/ML_on_CLAMP

It was going fine until I hit an error

>>> import cudf
Traceback (most recent call last):
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 247, in ensure_initialized
    self.cuInit(0)
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 320, in safe_cuda_api_call
    self._check_ctypes_error(fname, retcode)
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 388, in _check_ctypes_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/__init__.py", line 21, in <module>
    from cudf.core.algorithms import factorize
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/algorithms.py", line 9, in <module>
    from cudf.core.indexed_frame import IndexedFrame
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 57, in <module>
    from cudf.core.groupby.groupby import GroupBy
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/groupby/__init__.py", line 3, in <module>
    from cudf.core.groupby.groupby import GroupBy, Grouper
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/groupby/groupby.py", line 28, in <module>
    from cudf.core.udf.groupby_utils import jit_groupby_apply
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/udf/groupby_utils.py", line 11, in <module>
    import cudf.core.udf.utils
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/udf/utils.py", line 121, in <module>
    _PTX_FILE = _get_ptx_file(os.path.dirname(__file__), "shim_")
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/cudf/core/udf/utils.py", line 87, in _get_ptx_file
    dev = cuda.get_current_device()
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/api.py", line 435, in get_current_device
    return current_context().device
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 220, in get_context
    return _runtime.get_or_create_context(devnum)
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 138, in get_or_create_context
    return self._get_or_create_context_uncached(devnum)
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 153, in _get_or_create_context_uncached
    with driver.get_active_context() as ac:
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 488, in __enter__
    driver.cuCtxGetCurrent(byref(hctx))
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 285, in __getattr__
    self.ensure_initialized()
  File "/home/nobu/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 251, in ensure_initialized
    raise CudaSupportError(f"Error at driver init: {description}")
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
>>>
KeyboardInterrupt
>>>

I'll post this in the right place. Thanks.