Closed farleylai closed 2 years ago
It seems like for some reason, the default nvcc compilation does not generate code support for GTX 1080 Ti on a Titan X machine.
In that case, one may explicitly add the architecture and gencode options in the setup.py.
Here is the example of mmdet/ops/nms/setup.py
:
nvcc_ARCH = ['-arch=sm_52']
nvcc_ARCH += ["-gencode=arch=compute_75,code=\"compute_75\""]
nvcc_ARCH += ["-gencode=arch=compute_75,code=\"sm_75\""]
nvcc_ARCH += ["-gencode=arch=compute_70,code=\"sm_70\""]
nvcc_ARCH += ["-gencode=arch=compute_61,code=\"sm_61\""]
nvcc_ARCH += ["-gencode=arch=compute_52,code=\"sm_52\""]
extra_compile_args = {
'cxx': ['-Wno-unused-function', '-Wno-write-strings'],
'nvcc': nvcc_ARCH,}
Then specify the extra_compile_args composed from the above:
setup(
name='nms_cuda',
ext_modules=[
CUDAExtension('nms_cuda', [
'src/nms_cuda.cpp',
'src/nms_kernel.cu',
],
extra_compile_args=extra_compile_args,
),
CUDAExtension('nms_cpu', [
'src/nms_cpu.cpp',
]),
],
cmdclass={'build_ext': BuildExtension})
By rebuilding this particular nms
on the TITAN X machine and reinstalling mmdet
, it now works on GTX 1080 Ti too.
Nonetheless, the real cause is likely something behind.
Since the setup.py
is per CUDA module basis, it would be tedious to change all.
Any better suggestion and clarifications?
Update: it should be Titan X, not V.
The failure case is quite weird. It seems to be neither a forward nor backward compatibility issue for different architectures. (Isn't TITAN V the Volta arch?) Manually specifying the target arch can be a walkaround, and maybe we can wait for someone who figures this out.
I now tried to compare the output of cuobjdump -ptx
of shared libraries nms_cuda*.so
from different machines and found that only the one produced on Titan X is different from those on RTX Titan and 1080 Ti.
The objdumped results are attached FYI:
Though the arch and target are the same sm_30
by default, the PTX code version produced on Titan X is higher 6.4 with very different align offsets.
PS: the nvcc
installation through conda is the same across different machines over NFS:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Any insights?
Update: it should be Titan X, not V.
I also meet this issue, and I need some help.
inds = nms_cuda.nms(dets_th, iou_thr)
RuntimeError: CUDA error: a PTX JIT compilation failed (launch_kernel at /opt/conda/conda-bld/pytorch_1565272269120/work/aten/src/ATen/native/cuda/Loops.cuh:102)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f889750ae37 in /home/dengweijian/.conda/envs/mmlab/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #1: void at::native::gpu_index_kernel<nv_dl_wrapper_t<nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef
It seems like for some reason, the default nvcc compilation does not generate code support for GTX 1080 Ti on a Titan X machine. In that case, one may explicitly add the architecture and gencode options in the setup.py. Here is the example of
mmdet/ops/nms/setup.py
:nvcc_ARCH = ['-arch=sm_52'] nvcc_ARCH += ["-gencode=arch=compute_75,code=\"compute_75\""] nvcc_ARCH += ["-gencode=arch=compute_75,code=\"sm_75\""] ... I am also struggling with such problem. Could you please clarify, how to apply this? To add into a fresh mmdet/ops/nms/setup.py? Fresh file requires imports and maybe something else. To run it separately then from mmdetection/mmdet/ops/nms/ subfolder?
I was able to run the code by copying mmdetection/setup.py into mmdetection/mmdet/ops/nms/ and replacing the code after if __name__ == '__main__':
but the error still remains, don't know how to check whether something advanced to its fixing.
P.S. Suddenly setting a new equal conda environment with better nvcc version used fixed the problem (9.1 is installed in /usr/bin and 10.1 in /usr/local/cuda-10.1, I switched to the latter by adding to PATH). So looks like a build/rebuild problem since everything else is mainly equal.
I had this same problem with a GCloud Tesla P100-PCIE-16GB Pytorch 1.3.1 / torchvision 0.4.2/ cudatoolkit 10.1.243 / CUDA driver version 10020
Can't solve it so I move to Docker and it works. ARG PYTORCH="1.1.0" ARG CUDA="10.0" ARG CUDNN="7.5" So, it is not a GPU problem but smtg to do with software versions. Thanks.
The error is thrown when
mmdet
is compiled on a user machine with Titan V (Pascal) and executes on a cluster worker machine with a newer GTX 1080 Ti.Both machines have CUDA 10 installed. However, if compiled on a machine with RTX Titan, the execution on both machines with Titan V and GTX 1080 Ti is fine. After some cross-testing, it seems like this is the only failure case:
Any idea to address this failure case?
Update: it should be Titan X, not V.
Here is the minimal code to reproduce with GPU nms():
The error:
PyTorch 1.1 from miniconda 3