rapidsai / rmm

RAPIDS Memory Manager
https://docs.rapids.ai/api/rmm/stable/
Apache License 2.0
492 stars 198 forks source link

[BUG] RuntimeError: CUDAPluggableAllocator does not yet support cacheInfo #1362

Closed wurining closed 1 year ago

wurining commented 1 year ago

Describe the bug

I'm integrating RMM as a replacement for the default PyTorch Allocator. Everything works fine in simpler scenarios. However, with my project involving mixed precision training, some native operators and etc., I'm encountering an error after introducing RMM.

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
CUDAPluggableAllocator does not yet support cacheInfo. If you need it, please file an issue describing your use case.
  File "___prefix__/torch/nn/modules/conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
  File "___prefix__/torch/nn/modules/conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "___prefix__/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "___prefix__/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "___prefix__/torch/nn/modules/container.py", line 215, in forward
    input = module(input)
  File "___prefix__/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "___prefix__/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "__project__/models/modules/autoencoderkl.py", line 555, in forward
    x = block(x)
  File "___prefix__/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "___prefix__/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "__project__/models/modules/AKLReadout.py", line 212, in encode
    h = self.encoder(x)
  File "__project__/models/modules/AKLReadout.py", line 282, in encode_stage_2_inputs
    z_mu, z_sigma = self.encode(x)
  File "__project__/models/modules/Readouts.py", line 117, in encode_stage_2_inputs
    return net.encode_stage_2_inputs(x)
  File "__project__/models/modules/UnrealVAE.py", line 3512, in forward
    vae_latent = self.readout_net.encode_stage_2_inputs(responses, session)
  File "__project__/models/UnrealVAENNv2.py", line 60, in forward
    return self.unreal_vae_former.forward(*args, **kwargs)
  File "__project__/models/UnrealVAENNv2.py", line 391, in training_step
    ) = self.forward(
  File "___prefix__/lightning/pytorch/strategies/strategy.py", line 382, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "___prefix__/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "___prefix__/lightning/pytorch/loops/optimization/manual.py", line 112, in advance
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "___prefix__/lightning/pytorch/loops/optimization/manual.py", line 92, in run
    self.advance(kwargs)
  File "___prefix__/lightning/pytorch/loops/training_epoch_loop.py", line 242, in advance
    batch_output = self.manual_optimization.run(kwargs)
  File "___prefix__/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run
    self.advance(data_fetcher)
  File "___prefix__/lightning/pytorch/loops/fit_loop.py", line 359, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "___prefix__/lightning/pytorch/loops/fit_loop.py", line 202, in run
    self.advance()
  File "___prefix__/lightning/pytorch/trainer/trainer.py", line 1036, in _run_stage
    self.fit_loop.run()
  File "___prefix__/lightning/pytorch/trainer/trainer.py", line 990, in _run
    results = self._run_stage()
  File "___prefix__/lightning/pytorch/trainer/trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "___prefix__/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "___prefix__/lightning/pytorch/trainer/trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "__project__/trainer/task.py", line 85, in train_task
    trainer.fit(
  File "__project__/utils/utils.py", line 75, in wrap
    raise ex
  File "__project__/utils/utils.py", line 75, in wrap
    raise ex
  File "__project__/trainer/Trainer.py", line 36, in train
    metric_dict, _ = train_task(cfg)
  File "___prefix__/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "___prefix__/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "___prefix__/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "___prefix__/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "___prefix__/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "___prefix__/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "___prefix__/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "___prefix__/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "___prefix__/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "__project__/Trainer.py", line 48, in main
    train()
  File "__project__/Trainer.py", line 52, in <module>
    main()
  File "___prefix__/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "___prefix__/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
RuntimeError: CUDAPluggableAllocator does not yet support cacheInfo. If you need it, please file an issue describing your use case.

The error seems to originate from this location in the PyTorch library: torch/csrc/cuda/CUDAPluggableAllocator.cpp#L174C30-L174C39

Steps/Code to reproduce bug Currently, I cannot find a easier way to actively raise this error.

Expected behavior The RMM accidently call the cacheInfo func and it is not supported currently.

Environment details (please complete the following information):

**git***
commit e932418aa170f34a4a95a110ee48fde7f77e1e86 (HEAD -> main, origin/main)
Author: Daniel Wu <wurining@gmail.com>
Date:   Sat Oct 14 07:49:09 2023 +0100

    Update:
    - Add cut 0

***OS Information***
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Linux readlineint-server-a100 5.19.0-40-generic #41~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 31 16:00:14 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

***GPU Information***
Sun Oct 15 16:31:27 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:31:00.0 Off |                    0 |
| N/A   45C    P0    73W / 300W |  42053MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  Off  | 00000000:4B:00.0 Off |                    0 |
| N/A   43C    P0    65W / 300W |  42389MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3676      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3676      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

***CPU***
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 57 bits virtual
Byte Order:                      Little Endian
CPU(s):                          96
On-line CPU(s) list:             0-95
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz
CPU family:                      6
Model:                           106
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       2
Stepping:                        6
CPU max MHz:                     3600.0000
CPU min MHz:                     800.0000
BogoMIPS:                        4800.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
Virtualisation:                  VT-x
L1d cache:                       2.3 MiB (48 instances)
L1i cache:                       1.5 MiB (48 instances)
L2 cache:                        60 MiB (48 instances)
L3 cache:                        72 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-23,48-71
NUMA node1 CPU(s):               24-47,72-95
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

***CMake***
/data/runtime/conda_envs/envs/sensorium/bin/cmake
cmake version 3.26.4

CMake suite maintained and supported by Kitware (kitware.com/cmake).

***g++***
/usr/bin/g++
g++ (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

***nvcc***
/data/runtime/conda_envs/envs/sensorium/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jul_11_02:20:44_PDT_2023
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0

***Python***
/data/runtime/conda_envs/envs/sensorium/bin/python
Python 3.9.16

***Environment Variables***
PATH                            : /data/runtime/conda_envs/envs/sensorium/bin:/home/wurining/miniconda3/condabin:/home/wurining/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH                 : 
NUMBAPRO_NVVM                   : 
NUMBAPRO_LIBDEVICE              : 
CONDA_PREFIX                    : /data/runtime/conda_envs/envs/sensorium
PYTHON_PATH                     : 

***conda packages***
/home/wurining/miniconda3/condabin/conda
# packages in environment at /data/runtime/conda_envs/envs/sensorium:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   1.4.0                    pypi_0    pypi
aioboto3                  11.2.0                   pypi_0    pypi
aiobotocore               2.5.0                    pypi_0    pypi
aioitertools              0.11.0                   pypi_0    pypi
alabaster                 0.7.13                   pypi_0    pypi
alembic                   1.12.0                   pypi_0    pypi
annotated-types           0.5.0                    pypi_0    pypi
anyio                     3.7.1                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
argon2-cffi               21.3.0                   pypi_0    pypi
argon2-cffi-bindings      21.2.0                   pypi_0    pypi
arrow                     1.2.3                    pypi_0    pypi
asciitree                 0.3.3                    pypi_0    pypi
asttokens                 2.2.1                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
babel                     2.12.1                   pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
backoff                   2.2.1                    pypi_0    pypi
backports-functools-lru-cache 1.6.6                    pypi_0    pypi
beautifulsoup4            4.12.2                   pypi_0    pypi
black                     23.9.0                   pypi_0    pypi
bleach                    6.0.0                    pypi_0    pypi
blessed                   1.20.0                   pypi_0    pypi
blinker                   1.6.2                    pypi_0    pypi
boto3                     1.26.76                  pypi_0    pypi
botocore                  1.29.76                  pypi_0    pypi
ca-certificates           2023.08.22           h06a4308_0  
cachetools                5.3.1                    pypi_0    pypi
cebra                     0.2.0                    pypi_0    pypi
certifi                   2023.5.7                 pypi_0    pypi
cffi                      1.15.1                   pypi_0    pypi
charset-normalizer        2.1.1                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
cloudpickle               2.2.1                    pypi_0    pypi
cmaes                     0.10.0                   pypi_0    pypi
cmake                     3.26.4                   pypi_0    pypi
colorlog                  6.7.0                    pypi_0    pypi
comm                      0.1.3                    pypi_0    pypi
commonmark                0.9.1                    pypi_0    pypi
contourpy                 1.1.0                    pypi_0    pypi
cpflows                   0.1.2                    pypi_0    pypi
croniter                  1.4.1                    pypi_0    pypi
cryptography              41.0.1                   pypi_0    pypi
cuda                      12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-cccl                 12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-command-line-tools   12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-compiler             12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-cudart               12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-cudart-dev           12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-cudart-static        12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-cuobjdump            12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-cupti                12.2.131                      0    nvidia/label/cuda-12.2.1
cuda-cupti-static         12.2.131                      0    nvidia/label/cuda-12.2.1
cuda-cuxxfilt             12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-demo-suite           12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-documentation        12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-driver-dev           12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-gdb                  12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-libraries            12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-libraries-dev        12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-libraries-static     12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-nsight               12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nsight-compute       12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-nvcc                 12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvdisasm             12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvml-dev             12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvprof               12.2.131                      0    nvidia/label/cuda-12.2.1
cuda-nvprune              12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvrtc                12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvrtc-dev            12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvrtc-static         12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvtx                 12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-nvvp                 12.2.131                      0    nvidia/label/cuda-12.2.1
cuda-opencl               12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-opencl-dev           12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-profiler-api         12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-python               12.2.0                   pypi_0    pypi
cuda-runtime              12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-sanitizer-api        12.2.128                      0    nvidia/label/cuda-12.2.1
cuda-toolkit              12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-tools                12.2.1                        0    nvidia/label/cuda-12.2.1
cuda-visual-tools         12.2.1                        0    nvidia/label/cuda-12.2.1
cudf-cu12                 23.10.0                  pypi_0    pypi
cuml-cu12                 23.10.0                  pypi_0    pypi
cupy                      12.2.0                   pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
cython                    3.0.3                    pypi_0    pypi
dask                      2023.9.2                 pypi_0    pypi
dask-cuda                 23.10.0                  pypi_0    pypi
dask-cudf-cu12            23.10.0                  pypi_0    pypi
datajoint                 0.14.1                   pypi_0    pypi
dateutils                 0.6.12                   pypi_0    pypi
debugpy                   1.6.7                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
deepdiff                  6.3.1                    pypi_0    pypi
deeplake                  3.6.8                    pypi_0    pypi
defusedxml                0.7.1                    pypi_0    pypi
diffusers                 0.20.2                   pypi_0    pypi
dill                      0.3.6                    pypi_0    pypi
distributed               2023.9.2                 pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
docutils                  0.20.1                   pypi_0    pypi
dropout-layer-norm        0.1                      pypi_0    pypi
einops                    0.6.1                    pypi_0    pypi
entrypoints               0.4                      pypi_0    pypi
et-xmlfile                1.1.0                    pypi_0    pypi
executing                 1.2.0                    pypi_0    pypi
fastapi                   0.101.1                  pypi_0    pypi
fasteners                 0.18                     pypi_0    pypi
fastjsonschema            2.17.1                   pypi_0    pypi
fastrlock                 0.8.2                    pypi_0    pypi
filelock                  3.12.2                   pypi_0    pypi
flash-attn                2.3.2                    pypi_0    pypi
flask                     2.3.2                    pypi_0    pypi
fonttools                 4.40.0                   pypi_0    pypi
fqdn                      1.5.1                    pypi_0    pypi
fused-dense-lib           0.0.0                    pypi_0    pypi
fused-softmax-lib         0.0.0                    pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
gds-tools                 1.7.1.12                      0    nvidia/label/cuda-12.2.1
gitdb                     4.0.10                   pypi_0    pypi
gitpython                 3.1.31                   pypi_0    pypi
google-auth               2.22.0                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
greenlet                  3.0.0                    pypi_0    pypi
grpcio                    1.57.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
h5py                      3.9.0                    pypi_0    pypi
hdf5storage               0.1.19                   pypi_0    pypi
huggingface-hub           0.16.4                   pypi_0    pypi
humbug                    0.3.1                    pypi_0    pypi
hydra-colorlog            1.2.0                    pypi_0    pypi
hydra-core                1.3.2                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.31.1                   pypi_0    pypi
imagesize                 1.4.1                    pypi_0    pypi
importlib-metadata        6.7.0                    pypi_0    pypi
importlib-resources       5.12.0                   pypi_0    pypi
inquirer                  3.1.3                    pypi_0    pypi
ipykernel                 6.23.3                   pypi_0    pypi
ipython                   8.14.0                   pypi_0    pypi
ipython-genutils          0.2.0                    pypi_0    pypi
ipywidgets                8.0.7                    pypi_0    pypi
isoduration               20.11.0                  pypi_0    pypi
itsdangerous              2.1.2                    pypi_0    pypi
jedi                      0.18.2                   pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.3.1                    pypi_0    pypi
jsonpointer               2.4                      pypi_0    pypi
jsonschema                4.17.3                   pypi_0    pypi
jupyter                   1.0.0                    pypi_0    pypi
jupyter-client            8.3.0                    pypi_0    pypi
jupyter-console           6.6.3                    pypi_0    pypi
jupyter-core              5.3.1                    pypi_0    pypi
jupyter-events            0.6.3                    pypi_0    pypi
jupyter-server            2.7.0                    pypi_0    pypi
jupyter-server-terminals  0.4.4                    pypi_0    pypi
jupyterlab-pygments       0.2.2                    pypi_0    pypi
jupyterlab-widgets        3.0.8                    pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libcublas                 12.2.4.5                      0    nvidia/label/cuda-12.2.1
libcublas-dev             12.2.4.5                      0    nvidia/label/cuda-12.2.1
libcublas-static          12.2.4.5                      0    nvidia/label/cuda-12.2.1
libcufft                  11.0.8.91                     0    nvidia/label/cuda-12.2.1
libcufft-dev              11.0.8.91                     0    nvidia/label/cuda-12.2.1
libcufft-static           11.0.8.91                     0    nvidia/label/cuda-12.2.1
libcufile                 1.7.1.12                      0    nvidia/label/cuda-12.2.1
libcufile-dev             1.7.1.12                      0    nvidia/label/cuda-12.2.1
libcufile-static          1.7.1.12                      0    nvidia/label/cuda-12.2.1
libcurand                 10.3.3.129                    0    nvidia/label/cuda-12.2.1
libcurand-dev             10.3.3.129                    0    nvidia/label/cuda-12.2.1
libcurand-static          10.3.3.129                    0    nvidia/label/cuda-12.2.1
libcusolver               11.5.1.129                    0    nvidia/label/cuda-12.2.1
libcusolver-dev           11.5.1.129                    0    nvidia/label/cuda-12.2.1
libcusolver-static        11.5.1.129                    0    nvidia/label/cuda-12.2.1
libcusparse               12.1.2.129                    0    nvidia/label/cuda-12.2.1
libcusparse-dev           12.1.2.129                    0    nvidia/label/cuda-12.2.1
libcusparse-static        12.1.2.129                    0    nvidia/label/cuda-12.2.1
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libnpp                    12.2.0.5                      0    nvidia/label/cuda-12.2.1
libnpp-dev                12.2.0.5                      0    nvidia/label/cuda-12.2.1
libnpp-static             12.2.0.5                      0    nvidia/label/cuda-12.2.1
libnvjitlink              12.2.128                      0    nvidia/label/cuda-12.2.1
libnvjitlink-dev          12.2.128                      0    nvidia/label/cuda-12.2.1
libnvjpeg                 12.2.1.2                      0    nvidia/label/cuda-12.2.1
libnvjpeg-dev             12.2.1.2                      0    nvidia/label/cuda-12.2.1
libnvjpeg-static          12.2.1.2                      0    nvidia/label/cuda-12.2.1
libstdcxx-ng              11.2.0               h1234567_1  
lightning                 2.1.0                    pypi_0    pypi
lightning-cloud           0.5.37                   pypi_0    pypi
lightning-utilities       0.9.0                    pypi_0    pypi
line-profiler             4.1.1                    pypi_0    pypi
lit                       16.0.6                   pypi_0    pypi
literate-dataclasses      0.0.6                    pypi_0    pypi
llvmlite                  0.40.1                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
lpips                     0.1.4                    pypi_0    pypi
mako                      1.2.4                    pypi_0    pypi
markdown                  3.4.4                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib                3.7.1                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
minio                     7.1.15                   pypi_0    pypi
mistune                   3.0.1                    pypi_0    pypi
monai                     1.2.0                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msgpack                   1.0.7                    pypi_0    pypi
multiprocess              0.70.14                  pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
nbclassic                 1.0.0                    pypi_0    pypi
nbclient                  0.8.0                    pypi_0    pypi
nbconvert                 7.6.0                    pypi_0    pypi
nbformat                  5.9.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
nest-asyncio              1.5.6                    pypi_0    pypi
networkx                  3.1                      pypi_0    pypi
neuralpredictors          0.3.0                    pypi_0    pypi
ninja                     1.11.1                   pypi_0    pypi
nnfabrik                  0.2.1                    pypi_0    pypi
notebook                  6.5.4                    pypi_0    pypi
notebook-shim             0.2.3                    pypi_0    pypi
nsight-compute            2023.2.1.3                    0    nvidia/label/cuda-12.2.1
numba                     0.57.1                   pypi_0    pypi
numcodecs                 0.11.0                   pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nccl-cu12          2.18.1                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.2.140                 pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
nvtx                      0.2.8                    pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openpyxl                  3.1.2                    pypi_0    pypi
openssl                   3.0.11               h7f8727e_2  
optuna                    3.3.0                    pypi_0    pypi
ordered-set               4.1.0                    pypi_0    pypi
otumat                    0.3.1                    pypi_0    pypi
overrides                 7.3.1                    pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pandas                    1.5.3                    pypi_0    pypi
pandocfilters             1.5.0                    pypi_0    pypi
parquet                   1.3.1                    pypi_0    pypi
parso                     0.8.3                    pypi_0    pypi
partd                     1.4.1                    pypi_0    pypi
pathos                    0.3.0                    pypi_0    pypi
pathspec                  0.11.2                   pypi_0    pypi
pathtools                 0.1.2                    pypi_0    pypi
patsy                     0.5.3                    pypi_0    pypi
pexpect                   4.8.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       23.2.1                   pypi_0    pypi
platformdirs              3.8.0                    pypi_0    pypi
plotly                    5.15.0                   pypi_0    pypi
ply                       3.11                     pypi_0    pypi
pox                       0.3.2                    pypi_0    pypi
ppft                      1.7.6.6                  pypi_0    pypi
prometheus-client         0.17.0                   pypi_0    pypi
prompt-toolkit            3.0.38                   pypi_0    pypi
protobuf                  4.24.4                   pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
pyarrow                   12.0.1                   pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pydantic                  2.1.1                    pypi_0    pypi
pydantic-core             2.4.0                    pypi_0    pypi
pydot                     1.4.2                    pypi_0    pypi
pygments                  2.15.1                   pypi_0    pypi
pyjwt                     2.7.0                    pypi_0    pypi
pylibraft-cu12            23.10.0                  pypi_0    pypi
pymysql                   1.1.0                    pypi_0    pypi
pynvml                    11.4.1                   pypi_0    pypi
pyparsing                 3.1.0                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
python                    3.9.16               h955ad1f_3  
python-dotenv             1.0.0                    pypi_0    pypi
python-editor             1.0.4                    pypi_0    pypi
python-json-logger        2.0.7                    pypi_0    pypi
python-multipart          0.0.6                    pypi_0    pypi
pytorch-forecasting       1.0.0                    pypi_0    pypi
pytorch-lightning         2.0.9.post0              pypi_0    pypi
pytorch-optimizer         2.12.0                   pypi_0    pypi
pytorch-sphinx-theme      0.0.19                   pypi_0    pypi
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
pyzmq                     25.1.0                   pypi_0    pypi
qtconsole                 5.4.3                    pypi_0    pypi
qtpy                      2.3.1                    pypi_0    pypi
raft-dask-cu12            23.10.0                  pypi_0    pypi
readchar                  4.0.5                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
recommonmark              0.7.1                    pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rfc3339-validator         0.1.4                    pypi_0    pypi
rfc3986-validator         0.1.1                    pypi_0    pypi
rich                      13.5.2                   pypi_0    pypi
rmm-cu12                  23.10.0                  pypi_0    pypi
rootutils                 1.0.7                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
s3transfer                0.6.1                    pypi_0    pypi
safetensors               0.3.3                    pypi_0    pypi
scikit-image              0.21.0                   pypi_0    pypi
scikit-learn              1.3.0                    pypi_0    pypi
scipy                     1.11.0                   pypi_0    pypi
seaborn                   0.12.2                   pypi_0    pypi
send2trash                1.8.2                    pypi_0    pypi
sentry-sdk                1.26.0                   pypi_0    pypi
setproctitle              1.3.2                    pypi_0    pypi
setuptools                68.2.2                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
smmap                     5.0.0                    pypi_0    pypi
sniffio                   1.3.0                    pypi_0    pypi
snowballstemmer           2.2.0                    pypi_0    pypi
sortedcontainers          2.4.0                    pypi_0    pypi
soupsieve                 2.4.1                    pypi_0    pypi
sphinx                    7.0.1                    pypi_0    pypi
sphinxcontrib-applehelp   1.0.4                    pypi_0    pypi
sphinxcontrib-devhelp     1.0.2                    pypi_0    pypi
sphinxcontrib-htmlhelp    2.0.1                    pypi_0    pypi
sphinxcontrib-jsmath      1.0.1                    pypi_0    pypi
sphinxcontrib-qthelp      1.0.3                    pypi_0    pypi
sphinxcontrib-serializinghtml 1.1.5                    pypi_0    pypi
sqlalchemy                2.0.22                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
stack-data                0.6.2                    pypi_0    pypi
starlette                 0.27.0                   pypi_0    pypi
starsessions              1.3.0                    pypi_0    pypi
statsmodels               0.14.0                   pypi_0    pypi
subprocess32              3.5.4                    pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tblib                     2.0.0                    pypi_0    pypi
tenacity                  8.2.2                    pypi_0    pypi
tensorboard               2.14.0                   pypi_0    pypi
tensorboard-data-server   0.7.1                    pypi_0    pypi
terminado                 0.17.1                   pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
thriftpy2                 0.4.16                   pypi_0    pypi
tifffile                  2023.7.4                 pypi_0    pypi
tinycss2                  1.2.1                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
toolz                     0.12.0                   pypi_0    pypi
torch                     2.1.0                    pypi_0    pypi
torch-tb-profiler         0.4.1                    pypi_0    pypi
torchaudio                2.1.0                    pypi_0    pypi
torchinfo                 1.8.0                    pypi_0    pypi
torchmetrics              1.2.0                    pypi_0    pypi
torchvision               0.16.0                   pypi_0    pypi
tornado                   6.3.2                    pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
traitlets                 5.9.0                    pypi_0    pypi
treelite                  3.9.1                    pypi_0    pypi
treelite-runtime          3.9.1                    pypi_0    pypi
triton                    2.1.0                    pypi_0    pypi
typing-extensions         4.6.3                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
ucx-py-cu12               0.34.0                   pypi_0    pypi
uri-template              1.3.0                    pypi_0    pypi
urllib3                   1.26.16                  pypi_0    pypi
uvicorn                   0.23.2                   pypi_0    pypi
wandb                     0.15.4                   pypi_0    pypi
watchdog                  3.0.0                    pypi_0    pypi
wcwidth                   0.2.6                    pypi_0    pypi
webcolors                 1.13                     pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
websocket-client          1.6.1                    pypi_0    pypi
websockets                11.0.3                   pypi_0    pypi
werkzeug                  2.3.6                    pypi_0    pypi
wheel                     0.41.2           py39h06a4308_0  
widgetsnbextension        4.0.8                    pypi_0    pypi
wrapt                     1.15.0                   pypi_0    pypi
xz                        5.4.2                h5eee18b_0  
zarr                      2.16.0                   pypi_0    pypi
zict                      3.0.0                    pypi_0    pypi
zipp                      3.15.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0  

Additional context Add any other context about the problem here.

wurining commented 1 year ago

The project refs a Lightning Pytorch framework.

I have noticed that this framework has successfully move model to a correct device, which I use cuda:0 (I checked each layer's weight at every step before run F.conv1d(input, weight, bias, self.stride,self.padding, self.dilation, self.groups))

Until run F.conv1d(input, weight, bias, self.stride,self.padding, self.dilation, self.groups), the magic appears!

I get an error👌

I guess it is caused by cuda memory cache or something. I am not quite know how the cuda's details, so I didn't go deeper.

wence- commented 1 year ago

Thanks for the report. This looks to be an issue that is entirely within pytorch. RMM doesn't reference the cacheInfo symbol at all, all we do is tell pytorch that is should use RMM calls to allocate and deallocate memory (see https://github.com/rapidsai/rmm/blob/branch-23.12/python/rmm/allocators/torch.py).

Unfortunately, it appears that some pytorch algorithms require that the memory allocator in pytorch implement the cacheInfo interface. This interface is provided in pytorch, but there is no way for an external allocator (like RMM) to implement it. I think the reason you don't see the error until the convolution is that the request for the cacheInfo information only happens in convolutional layers: https://github.com/pytorch/pytorch/blob/9af82fa2b86fb71df503082b1960c9392f9dc66d/aten/src/ATen/native/cudnn/Conv_v7.cpp#L212

So I recommend you report an issue to pytorch, since it looks like they don't provide an interface that allows external allocators to work with pytorch programs in all cases.

wurining commented 1 year ago

Thanks for the report. This looks to be an issue that is entirely within pytorch. RMM doesn't reference the cacheInfo symbol at all, all we do is tell pytorch that is should use RMM calls to allocate and deallocate memory (see https://github.com/rapidsai/rmm/blob/branch-23.12/python/rmm/allocators/torch.py).

Unfortunately, it appears that some pytorch algorithms require that the memory allocator in pytorch implement the cacheInfo interface. This interface is provided in pytorch, but there is no way for an external allocator (like RMM) to implement it. I think the reason you don't see the error until the convolution is that the request for the cacheInfo information only happens in convolutional layers: https://github.com/pytorch/pytorch/blob/9af82fa2b86fb71df503082b1960c9392f9dc66d/aten/src/ATen/native/cudnn/Conv_v7.cpp#L212

So I recommend you report an issue to pytorch, since it looks like they don't provide an interface that allows external allocators to work with pytorch programs in all cases.

Cool, it seem like happened in cudnn lib.

Cause I set torch.backends.cudnn.benchmark = True to speed up, the PyTorch will replace general operators by cudnn's native operators. Seem the cacheInfo is required here.

I try to set flag to False, then everything go back normal.😊

Thank you very much.