[Bug] Pointpillars training fails on Kitti after 2 epoch (ERROR: numba.cuda.cudadrv.driver.LinkerError and ptxas application ptx input, line 9; fatal : Unsupported .version 7.6; current version is '7.4') #2720
running
python python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py
on KITTI dataset
after the first epoch ends I get:
Converting 3D prediction to KITTI format
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 50/50, 895.5 task/s, elapsed: 0s, ETA: 0s
Result is saved to /tmp/tmpqdiad9q5/results/pred_instances_3d.pkl.
Traceback (most recent call last):
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 2705, in add_ptx
driver.cuLinkAddData(self.handle, enums.CU_JIT_INPUT_PTX,
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 320, in safe_cuda_api_call
self._check_ctypes_error(fname, retcode)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 388, in _check_ctypes_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tool/train.py", line 135, in
main()
File "tool/train.py", line 131, in main
runner.train()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train
model = self.train_loop.run() # type: ignore
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 102, in run
self.runner.val_loop.run()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 366, in run
metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 79, in evaluate
_results = metric.evaluate(size)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/evaluator/metric.py", line 133, in evaluate
_metrics = self.compute_metrics(results) # type: ignore
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/metrics/kitti_metric.py", line 205, in compute_metrics
ap_dict = self.kitti_evaluate(
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/metrics/kitti_metric.py", line 244, in kitti_evaluate
ap_result_str, apdict = kitti_eval(
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 725, in kitti_eval
mAP40_3d, mAP40_aos = do_eval(gt_annos, dt_annos,
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 626, in do_eval
ret = eval_class(gt_annos, dt_annos, current_classes, difficultys, 1,
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 480, in eval_class
rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 384, in calculate_iou_partly
overlap_part = bev_box_overlap(dt_boxes,
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 118, in bev_box_overlap
from .rotate_iou import rotate_iou_gpu_eval
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/rotate_iou.py", line 283, in
def rotate_iou_kernel_eval(N,
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/decorators.py", line 115, in _jit
disp.compile(argtypes)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 796, in compile
kernel.bind()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 178, in bind
self._codelibrary.get_cufunc()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/codegen.py", line 208, in get_cufunc
cubin = self.get_cubin(cc=device.compute_capability)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/codegen.py", line 181, in get_cubin
linker.add_ptx(ptx.encode())
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 2708, in add_ptx
raise LinkerError("%s\n%s" % (e, self.error_log))
numba.cuda.cudadrv.driver.LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
ptxas application ptx input, line 9; fatal : Unsupported .version 7.6; current version is '7.4'
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: NVIDIA GeForce GTX 1070 Ti CUDA_HOME: /home/lw/miniconda3/envs/openmmlab NVCC: Cuda compilation tools, release 11.6, V11.6.124 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 1.13.1 PyTorch compiling details: PyTorch built with:
TorchVision: 0.14.1 OpenCV: 4.8.0 MMEngine: 0.8.4 MMDetection: 3.1.0 MMDetection3D: 1.2.0+c04831c spconv2.0: False Numba: 0.56.4 Numpy: 1.19.5
Reproduces the problem - code sample
none
Reproduces the problem - command or script
python python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py
Reproduces the problem - error message
running python python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py on KITTI dataset after the first epoch ends I get:
Converting 3D prediction to KITTI format [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 50/50, 895.5 task/s, elapsed: 0s, ETA: 0s Result is saved to /tmp/tmpqdiad9q5/results/pred_instances_3d.pkl. Traceback (most recent call last): File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 2705, in add_ptx driver.cuLinkAddData(self.handle, enums.CU_JIT_INPUT_PTX, File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 320, in safe_cuda_api_call self._check_ctypes_error(fname, retcode) File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 388, in _check_ctypes_error raise CudaAPIError(retcode, msg) numba.cuda.cudadrv.driver.CudaAPIError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "tool/train.py", line 135, in
main()
File "tool/train.py", line 131, in main
runner.train()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train
model = self.train_loop.run() # type: ignore
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 102, in run
self.runner.val_loop.run()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 366, in run
metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 79, in evaluate
_results = metric.evaluate(size)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/evaluator/metric.py", line 133, in evaluate
_metrics = self.compute_metrics(results) # type: ignore
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/metrics/kitti_metric.py", line 205, in compute_metrics
ap_dict = self.kitti_evaluate(
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/metrics/kitti_metric.py", line 244, in kitti_evaluate
ap_result_str, apdict = kitti_eval(
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 725, in kitti_eval
mAP40_3d, mAP40_aos = do_eval(gt_annos, dt_annos,
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 626, in do_eval
ret = eval_class(gt_annos, dt_annos, current_classes, difficultys, 1,
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 480, in eval_class
rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 384, in calculate_iou_partly
overlap_part = bev_box_overlap(dt_boxes,
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py", line 118, in bev_box_overlap
from .rotate_iou import rotate_iou_gpu_eval
File "/home/lw/LW/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/rotate_iou.py", line 283, in
def rotate_iou_kernel_eval(N,
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/decorators.py", line 115, in _jit
disp.compile(argtypes)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 796, in compile
kernel.bind()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 178, in bind
self._codelibrary.get_cufunc()
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/codegen.py", line 208, in get_cufunc
cubin = self.get_cubin(cc=device.compute_capability)
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/codegen.py", line 181, in get_cubin
linker.add_ptx(ptx.encode())
File "/home/lw/miniconda3/envs/openmmlab/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 2708, in add_ptx
raise LinkerError("%s\n%s" % (e, self.error_log))
numba.cuda.cudadrv.driver.LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
ptxas application ptx input, line 9; fatal : Unsupported .version 7.6; current version is '7.4'
Additional information
No response