RuntimeError: CUDA error: misaligned address

Thank your your research and release the source code I am try to run your repo, but I have problem with CUDA. I run with the command on the 1 GPU 4090Ti (24GB)

python launch_train.py \
--dataset semantic_kitti \
--path_dataset ./dataset/point_cloud/SemanticKITTI \
--log_path ./pretrained_models/WaffleIron-48-256__kitti/ \
--config ./configs/WaffleIron-48-256__kitti.yaml \
--fp16 \
--restart \
--eval

the errors as the logs below.

Trainer on gpu: None. World size:1.
/home/vuonghn/research/code/open-source/WaffleIron/utils/trainer.py:250: UserWarning: Optimizer state not available
  warnings.warn("Optimizer state not available")
/home/vuonghn/research/code/open-source/WaffleIron/utils/trainer.py:255: UserWarning: Scheduler state not available
  warnings.warn("Scheduler state not available")
/home/vuonghn/research/code/open-source/WaffleIron/utils/trainer.py:260: UserWarning: Scaler state not available
  warnings.warn("Scaler state not available")
Checkpoint loaded on cuda:0 (cuda:0): ./pretrained_models/WaffleIron-48-256__kitti/

Validation: 0/45 epochs
       0%|                                                  | 0/1018 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "launch_train.py", line 375, in <module>
    main(args, config)
  File "launch_train.py", line 309, in main
    distributed_training(args.gpu, ngpus_per_node, args, config)
  File "launch_train.py", line 253, in distributed_training
    mng.one_epoch(training=False)
  File "/home/vuonghn/research/code/open-source/WaffleIron/utils/trainer.py", line 169, in one_epoch
    out = net(*net_inputs)
  File "/home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vuonghn/research/code/open-source/WaffleIron/waffleiron/segmenter.py", line 45, in forward
    tokens = self.waffleiron(tokens, cell_ind, occupied_cell)
  File "/home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vuonghn/research/code/open-source/WaffleIron/waffleiron/backbone.py", line 234, in forward
    sp_mat = [
  File "/home/vuonghn/research/code/open-source/WaffleIron/waffleiron/backbone.py", line 235, in <listcomp>
    build_proj_matrix(
  File "/home/vuonghn/research/code/open-source/WaffleIron/waffleiron/backbone.py", line 41, in build_proj_matrix
    weight_per_point = 1.0 / (num_points_per_cells.reshape(-1) + 1e-6)
  File "/home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/_tensor.py", line 40, in wrapped
    return f(*args, **kwargs)
  File "/home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/_tensor.py", line 852, in __rdiv__
    return self.reciprocal() * other
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7ff58c1114d7 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7ff58c0db36b in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7ff561d3fb58 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1251d5e (0x7ff4f5651d5e in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x4d56c6 (0x7ff55aed56c6 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x3ee77 (0x7ff58c0f6e77 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #6: c10::TensorImpl::~TensorImpl() + 0x1be (0x7ff58c0ef69e in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7ff58c0ef7b9 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x75acc8 (0x7ff55b15acc8 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: THPVariable_subclass_dealloc(_object*) + 0x325 (0x7ff55b15b075 in /home/vuonghn/anaconda3/envs/wi/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #10: python() [0x4d39ff]
frame #11: python() [0x4e0970]
frame #12: python() [0x4f1828]
frame #13: python() [0x4f1811]
frame #14: python() [0x4f1811]
frame #15: python() [0x4f1811]
frame #16: python() [0x4c9310]
<omitting python frames>
frame #22: <unknown function> + 0x29d90 (0x7ff58d829d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: __libc_start_main + 0x80 (0x7ff58d829e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #24: python() [0x579d3d]

./run.sh: line 8: 706387 Aborted                 (core dumped) python launch_train.py --dataset semantic_kitti --path_dataset /home/vuonghn/research/dataset/point_cloud/SemanticKITTI --log_path ./pretrained_models/WaffleIron-48-256__kitti/ --config ./configs/WaffleIron-48-256__kitti.yaml --fp16 --restart --eval

valeoai / WaffleIron

RuntimeError: CUDA error: misaligned address #4