RuntimeError: CUDA error: while import torch_sparse

  File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "xfraud/data_extract.py", line 82, in main
    init_distributed()
  File "/app/xfraud/distributed_utils.py", line 75, in init_distributed
    dist.barrier()
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2792, in barrier
    work = default_pg.barrier(opts=opts)
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.

when I add this imports in my scripts, this exception will be thrown

from torch_sparse import SparseTensor, set_diag

I guess it is due to the torch_sparser build problem, here is my dockerfile

FROM nvcr.io/nvidia/pytorch:22.08-py3
# https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

RUN apt update && DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends \
        build-essential \
        cmake \
        git \
        curl \
        vim \
        wget \
        htop \
        ca-certificates \
        openssh-client \
        openssh-server \
        &&\
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip3 install -i https://mirrors.aliyun.com/pypi/simple/ setuptools pip -U && \
    pip3 install -i https://mirrors.aliyun.com/pypi/simple/ \
    astunparse\
    numpy\
    ninja\
    pyyaml\
    cffi\
    typing_extensions\
    future\
    six\
    requests\
    dataclasses

WORKDIR /app
ENV PYTHONPATH /app:$PYTHONPATH

ENV FORCE_CUDA="1" 
# build image in a ci/cd environment

RUN pip3 install torch-cluster==1.5.9 torch-scatter==2.0.8 torch-sparse==0.6.11 torch-spline-conv==1.2.1

ADD requirements.txt requirements.txt
RUN pip3 install -r requirements.txt  -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

here is my device and driver

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:08.0 Off |                    0 |
| N/A   46C    P0    27W /  70W |  14540MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

here is my TORCH_CUDA_ARCH_LIST

TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX

import torch.distributed as dist

# uncomment below will cause the issue
# from torch_sparse import SparseTensor, set_diag

dist.init_process_group(backend="nccl")
dist.barrier()

looks like the torch.empty will cause this issue

import torch
# uncomment below will cause the issue
# from torch_sparse import SparseTensor, set_diag
empty = torch.empty((1,), device="cuda")

Traceback (most recent call last):
  File "xfraud/test_barrier.py", line 7, in <module>
    empty = torch.empty((1,), device="cuda")
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.

rusty1s / pytorch_sparse

RuntimeError: CUDA error: while import torch_sparse #278