File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "xfraud/data_extract.py", line 82, in main
init_distributed()
File "/app/xfraud/distributed_utils.py", line 75, in init_distributed
dist.barrier()
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2792, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
when I add this imports in my scripts, this exception will be thrown
from torch_sparse import SparseTensor, set_diag
I guess it is due to the torch_sparser build problem, here is my dockerfile
import torch.distributed as dist
# uncomment below will cause the issue
# from torch_sparse import SparseTensor, set_diag
dist.init_process_group(backend="nccl")
dist.barrier()
looks like the torch.empty will cause this issue
import torch
# uncomment below will cause the issue
# from torch_sparse import SparseTensor, set_diag
empty = torch.empty((1,), device="cuda")
Traceback (most recent call last):
File "xfraud/test_barrier.py", line 7, in <module>
empty = torch.empty((1,), device="cuda")
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
when I add this imports in my scripts, this exception will be thrown
I guess it is due to the torch_sparser build problem, here is my dockerfile
here is my device and driver
here is my TORCH_CUDA_ARCH_LIST
looks like the
torch.empty
will cause this issue