pangsu0613 / CLOCs

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
MIT License
352 stars 68 forks source link

CUDA error #10

Closed yinggo closed 3 years ago

yinggo commented 3 years ago

I followed the instructions, but still meet some errors, please help. I got RuntimeError: cuda runtime error (77) What's the correct version of CUDA and CUDNN?

My environments:

$ conda list
# packages in environment at /home/ds1/anaconda3/envs/clocs:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
asn1crypto                0.22.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
boost                     1.61.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
bzip2                     1.0.6                         3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
ca-certificates           2021.1.19            h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi                   2016.2.28                py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cffi                      1.10.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cmake                     3.6.3                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda                     4.5.13                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda-env                 2.6.0                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cryptography              1.8.1                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
cudatoolkit               9.0                  h13b8566_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn                     7.6.5                 cuda9.0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
curl                      7.54.1                        0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
expat                     2.1.0                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
fire                      0.4.0                    pypi_0    pypi
freetype                  2.5.5                         2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
icu                       54.1                          0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
idna                      2.6                      py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
imageio                   2.9.0                    pypi_0    pypi
intel-openmp              2020.2                      254    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jbig                      2.1                           0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
jpeg                      9b                            0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
krb5                      1.13.2                        0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
libffi                    3.2.1                         1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
libgcc-ng                 9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libpng                    1.6.30                        1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
libssh2                   1.8.0                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
libstdcxx-ng              9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libtiff                   4.0.6                         3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
llvmlite                  0.35.0                   pypi_0    pypi
mkl                       2020.2                      256    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl-service               2.3.0            py36he8ac12f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft                   1.2.0            py36h23d657b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random                1.1.1            py36h0573a6f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ncurses                   5.9                          10    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
networkx                  2.5                      pypi_0    pypi
ninja                     1.7.2                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
numba                     0.52.0                   pypi_0    pypi
numpy                     1.19.2           py36h54aff64_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base                1.19.2           py36hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
olefile                   0.44                     py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
openssl                   1.0.2l                        0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
packaging                 16.8                     py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pillow                    4.2.1                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pip                       9.0.1                    py36_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
protobuf                  3.14.0                   pypi_0    pypi
pybind11                  2.6.2                    pypi_0    pypi
pycosat                   0.6.3            py36h27cfd23_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pycparser                 2.18                     py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pyopenssl                 17.0.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pyparsing                 2.2.0                    py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
python                    3.6.2                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pytorch                   1.1.0           cuda90py36h8b0c50b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pywavelets                1.1.1                    pypi_0    pypi
readline                  6.2                           2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
requests                  2.14.2                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
ruamel_yaml               0.11.14                  py36_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
scikit-image              0.17.2                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
setuptools                36.4.0                   py36_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
shapely                   1.7.1                    pypi_0    pypi
six                       1.10.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
spconv                    1.0                      pypi_0    pypi
sqlite                    3.13.0                        0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
tensorboardx              2.1                      pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tifffile                  2020.9.3                 pypi_0    pypi
tk                        8.5.18                        0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
torchvision               0.3.0           cuda90py36h6edc907_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wheel                     0.29.0                   py36_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
xz                        5.2.3                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
yaml                      0.1.6                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
zlib                      1.2.11                        0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free

Here shows the error in detail:

/home/ds1/anaconda3/envs/clocs/CLOCs/second/pytorch/train.py(107)train()
-> torch.manual_seed(3)
(Pdb) c
2d detection path: /home/ds1/anaconda3/envs/clocs/CLOCs/d2_detection_data/data
sparse_shape: [  41 1600 1408]
num_class is : 1
load existing model
{'Car': 5}
[-1]
load 14357 Car database infos
load 2207 Pedestrian database infos
load 734 Cyclist database infos
load 1297 Van database infos
load 56 Person_sitting database infos
load 488 Truck database infos
load 224 Tram database infos
load 337 Misc database infos
After filter database:
load 10520 Car database infos
load 2104 Pedestrian database infos
load 594 Cyclist database infos
load 826 Van database infos
load 53 Person_sitting database infos
load 321 Truck database infos
load 199 Tram database infos
load 259 Misc database infos
remain number of infos: 3712
remain number of infos: 3769
WORKER 0 seed: 1612422291
WORKER 1 seed: 1612422292
WORKER 2 seed: 1612422293
/home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py:146: NumbaWarning: 
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: No implementation of function Function(<built-in function getitem>) found for signature:

 >>> getitem(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))

There are 22 candidate implementations:
   - Of which 20 did not match due to:
   Overload of function 'getitem': File: <numerous>: Line N/A.
     With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    No match.
   - Of which 2 did not match due to:
   Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
     With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    Rejected as the implementation raised a specific error:
      TypeError: unsupported array index type list(int64)<iv=None> in Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>)
  raised from /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69

During: typing of intrinsic-call at /home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py (162)

File "second/core/geometry.py", line 162:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
        vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
                                 list(range(num_points_of_polygon - 1)), :]
                                 ^

  @numba.jit
/home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py:146: NumbaWarning: 
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: No implementation of function Function(<built-in function getitem>) found for signature:

 >>> getitem(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))

There are 22 candidate implementations:
   - Of which 20 did not match due to:
   Overload of function 'getitem': File: <numerous>: Line N/A.
     With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    No match.
   - Of which 2 did not match due to:
   Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
     With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    Rejected as the implementation raised a specific error:
      TypeError: unsupported array index type list(int64)<iv=None> in Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>)
  raised from /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69

During: typing of intrinsic-call at /home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py (162)

File "second/core/geometry.py", line 162:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
        vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
                                 list(range(num_points_of_polygon - 1)), :]
                                 ^

  @numba.jit
/home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py:146: NumbaWarning: 
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: No implementation of function Function(<built-in function getitem>) found for signature:

 >>> getitem(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))

There are 22 candidate implementations:
   - Of which 20 did not match due to:
   Overload of function 'getitem': File: <numerous>: Line N/A.
     With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    No match.
   - Of which 2 did not match due to:
   Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
     With argument(s): '(array(float32, 3d, C), Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>))':
    Rejected as the implementation raised a specific error:
      TypeError: unsupported array index type list(int64)<iv=None> in Tuple(slice<a:b>, list(int64)<iv=None>, slice<a:b>)
  raised from /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69

During: typing of intrinsic-call at /home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py (162)

File "second/core/geometry.py", line 162:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
        vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
                                 list(range(num_points_of_polygon - 1)), :]
                                 ^

  @numba.jit
/home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py:146: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>

File "second/core/geometry.py", line 170:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    cross = 0.0
    for i in range(num_points):
    ^

  @numba.jit
/home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py:146: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>

File "second/core/geometry.py", line 170:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    cross = 0.0
    for i in range(num_points):
    ^

  @numba.jit
/home/ds1/anaconda3/envs/clocs/CLOCs/second/core/geometry.py:146: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>

File "second/core/geometry.py", line 170:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    cross = 0.0
    for i in range(num_points):
    ^

  @numba.jit
/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.

File "second/core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    # first convert polygon to directed lines
    num_points_of_polygon = polygon.shape[1]
    ^

  state.func_ir.loc))
/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "second/core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    # first convert polygon to directed lines
    num_points_of_polygon = polygon.shape[1]
    ^

  state.func_ir.loc))
/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.

File "second/core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    # first convert polygon to directed lines
    num_points_of_polygon = polygon.shape[1]
    ^

  state.func_ir.loc))
/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "second/core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    # first convert polygon to directed lines
    num_points_of_polygon = polygon.shape[1]
    ^

  state.func_ir.loc))
/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.

File "second/core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    # first convert polygon to directed lines
    num_points_of_polygon = polygon.shape[1]
    ^

  state.func_ir.loc))
/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "second/core/geometry.py", line 157:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    <source elided>
    # first convert polygon to directed lines
    num_points_of_polygon = polygon.shape[1]
    ^

  state.func_ir.loc))
THCudaCheck FAIL file=../torch/csrc/generic/serialization.cpp line=23 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
  File "./second/pytorch/train.py", line 107, in train
    torch.manual_seed(3)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ds1/anaconda3/envs/clocs/CLOCs/second/pytorch/models/voxelnet.py", line 304, in forward
    voxel_features, coors, batch_size_dev)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ds1/anaconda3/envs/clocs/CLOCs/second/pytorch/models/middle.py", line 545, in forward
    ret = self.middle_conv(ret)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/modules.py", line 123, in forward
    input = module(input)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/conv.py", line 157, in forward
    outids.shape[0])
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/functional.py", line 83, in forward
    return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/ops.py", line 112, in indice_conv
    int(inverse), int(subm))
RuntimeError: CUDA error: an illegal memory access was encountered (copy_to_cpu at /tmp/pip-req-build-9xcrj8au/aten/src/ATen/native/cuda/Copy.cu:199)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7fcf259c82da in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: (anonymous namespace)::copy_to_cpu(at::Tensor&, at::Tensor const&) + 0x39d (0x7fcf2c40939d in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: void (anonymous namespace)::_copy__cuda<int>(at::Tensor&, at::Tensor const&, bool) + 0x914 (0x7fcf2c4cbd64 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::native::_s_copy__cuda(at::Tensor&, at::Tensor const&, bool) + 0xdf (0x7fcf2c409aef in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::_s_copy_from_cuda(at::Tensor const&, at::Tensor const&, bool) + 0x42 (0x7fcf2c409c72 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #5: at::CUDAType::_s_copy_from(at::Tensor const&, at::Tensor const&, bool) const + 0x11b (0x7fcf2cd52a8b in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #6: at::native::_s_copy__cpu(at::Tensor&, at::Tensor const&, bool) + 0x7e (0x7fcf263f05ce in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #7: <unknown function> + 0x91b9df (0x7fcf267199df in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #8: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x673 (0x7fcf263f2343 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #9: torch::autograd::VariableType::copy_(at::Tensor&, at::Tensor const&, bool) const + 0x478 (0x7fcf29e299e8 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #10: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) + 0xa6b (0x7fcf265a7ffb in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #11: at::TypeDefault::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) const + 0x2b (0x7fcf268309fb in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #12: torch::autograd::VariableType::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) const + 0x33f (0x7fcf29c1286f in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #13: at::Tensor spconv::indiceConv<float>(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long) + 0x187 (0x7fcec5bd3a37 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/libspconv.so)
frame #14: void torch::jit::detail::callOperatorWithTuple<at::Tensor (* const)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul>(c10::FunctionSchema const&, at::Tensor (* const&&)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long>&, torch::Indices<0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul>) + 0x2bc (0x7fcec5bd7e0c in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/libspconv.so)
frame #15: std::_Function_handler<int (std::vector<c10::IValue, std::allocator<c10::IValue> >&), torch::jit::createOperator<at::Tensor (*)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long)>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor (*&&)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long))::{lambda(std::vector<c10::IValue, std::allocator<c10::IValue> >&)#1}>::_M_invoke(std::_Any_data const&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x61 (0x7fcec5bd8081 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/spconv/libspconv.so)
frame #16: <unknown function> + 0x3991c0 (0x7fcf50c761c0 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #17: <unknown function> + 0x36f445 (0x7fcf50c4c445 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #18: <unknown function> + 0x11f43d (0x7fcf509fc43d in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #29: THPFunction_apply(_object*, _object*) + 0x747 (0x7fcf50c23347 in /home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./second/pytorch/train.py", line 920, in <module>
    fire.Fire()
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "./second/pytorch/train.py", line 107, in train
    torch.manual_seed(3)
  File "/home/ds1/anaconda3/envs/clocs/CLOCs/torchplus/train/checkpoint.py", line 173, in save_models
    save(model_dir, model, name, global_step, max_to_keep, keep_latest)
  File "/home/ds1/anaconda3/envs/clocs/CLOCs/torchplus/train/checkpoint.py", line 90, in save
    torch.save(model.state_dict(), ckpt_path)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/serialization.py", line 224, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/serialization.py", line 149, in _with_file_like
    return body(f)
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/serialization.py", line 224, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/ds1/anaconda3/envs/clocs/lib/python3.6/site-packages/torch/serialization.py", line 303, in _save
    serialized_storages[key]._write_file(f, _should_read_directly(f))
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at ../torch/csrc/generic/serialization.cpp:23
pangsu0613 commented 3 years ago

I have tested on CUDA 9,10 and even 11, but I didn't see this error before. Looks like the spconv is not installed properly, can you run SECOND successfully?

BTW, you could add the following things at the beginning of the code to avoid these NUMBA warnings from numba.core.errors import NumbaDeprecationWarning, NumbaPendingDeprecationWarning,NumbaPerformanceWarning,NumbaWarning import warnings warnings.simplefilter('ignore', category=NumbaDeprecationWarning) warnings.simplefilter('ignore', category=NumbaPendingDeprecationWarning) warnings.simplefilter('ignore', category=NumbaPerformanceWarning) warnings.simplefilter('ignore', category=NumbaWarning) warnings.simplefilter('ignore') warnings.filterwarnings('ignore')

yinggo commented 3 years ago

Thanks for your kind reply. It solved.

sunflowvor commented 3 years ago

Thanks for your kind reply. It solved.

hello, I met the same question, how can I solve this issue?