Closed selmadeac closed 3 years ago
Hi, @selmadeac
The _Voxelization
class will be called by Voxelization
class in the same file here in voxelize.py
. The max_points
is set by max_num_points
argument in Voxelization
. So you can set the max_points
by setting max_num_points
in configs.
For example, the SECOND config in here will set the max_num_points to be 5. So you can see the shapes of the tensors to be 5.
Thank you very much for your answer! I understand now.
In the process of better understanding voxelization in mmdet3d framework I tried to implement a function which voxelizez the space in python using numpy arrays as basic structure. On a personal project I see that the voxels are built correctly, but I tried using my function instead of hard_voxelize function. I converted the tensors to numpy arrays, processed them and then changed them back to tensor. I managed to keep the required output dimensions and types. It appears to be working until I get a Segmentation fault in voxelnet.py, extract_feat(), at this line: x = self.middle_encoder(voxel_features, coors, batch_size).
The shapes and types of the output were the same as previous, why might this occur?
Do you put them back to GPU? Do you convert the tensor to float?
Hi @Wuziyi616 !
What do you mean by putting them back to GPU? Is the conversion to tensors enough?
In the beginning I detach the points tensor and pass it into cpu and convert it to numpy in order to have access to the points:
points = points.cpu().detach().numpy()
And in the end I only pass my data structures into tensors:
voxels_out = torch.from_numpy(voxels)
coors_out = torch.from_numpy(coordinates)
num_points_per_voxel_out = torch.from_numpy(num_points_per_voxel)
My data variables are originally declared as:
coordinates = coordinates.astype(np.int32)
num_points_per_voxel = np.zeros((M),dtype = np.int32)
voxels = np.zeros((M,max_points,3),dtype=float)
Is it not enough?
Thank you for your reply! I put them back to gpu calling .to('cuda:0') property after conversion to tensor, but it is still not working:
Traceback (most recent call last):
File "./tools/train.py", line 222, in <module>
main()
File "./tools/train.py", line 218, in main
meta=meta)
File "/home/selma/workspace/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
meta=meta)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
return old_func(*args, **kwargs)
File "/home/selma/workspace/mmdetection3d/mmdet3d/models/detectors/base.py", line 58, in forward
return self.forward_train(**kwargs)
File "/home/selma/workspace/mmdetection3d/mmdet3d/models/detectors/voxelnet.py", line 124, in forward_train
x = self.extract_feat(points, img_metas)
File "/home/selma/workspace/mmdetection3d/mmdet3d/models/detectors/voxelnet.py", line 45, in extract_feat
x = self.middle_encoder(voxel_features, coors, batch_size)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
return old_func(*args, **kwargs)
File "/home/selma/workspace/mmdetection3d/mmdet3d/models/middle_encoders/sparse_encoder.py", line 112, in forward
x = self.conv_input(input_sp_tensor)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/modules.py", line 130, in forward
input = module(input)
File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/conv.py", line 186, in forward
outids.shape[0])
File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/functional.py", line 65, in forward
indice_pair_num, num_activate_out, False, True)
File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/ops.py", line 119, in indice_conv
int(subm))
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378098133/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f0d03e3b2f2 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f0d03e3867b in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f0d1c254219 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f0d03e233a4 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e6a3a (0x7f0c682b8a3a in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e6ae1 (0x7f0c682b8ae1 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #26: __libc_start_main + 0xf3 (0x7f0d443bf0b3 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
Sorry for the late reply. Let me first make sure, do you use multi-gpu distributed training? If that's the case, you can't simply set cuda:0
.
You need to first get the current device of this process by devide = points.device
, then call tensor.to(device)
.
Thank you for your reply, @Wuziyi616 ! I did take the device name as you pointed and passed all data structures to tensors and then to gpu.
points = torch.from_numpy(points2.copy())
voxels_out = torch.from_numpy(voxels.copy())
coors_out = torch.from_numpy(coordinates.copy())
num_points_per_voxel_out = torch.from_numpy(num_points_per_voxel.copy())
points = points.to(device)
voxels_out = voxels_out.to(device)
coors_out = coors_out.to(device)
num_points_per_voxel_out = num_points_per_voxel_out.to(device)
The dtype
's for voxels_out,coors_out and num_points_per_voxel_out
are torch.float32, torch.int32,and torch.int32
.
But the error persists:
File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/ops.py", line 119, in indice_conv
int(subm))
RuntimeError: CUDA error: an illegal memory access was encountered
Do you eliminate voxels with number of points fewer than 1?
Solved! I used x y z voxel coords instead of z y x.
hi! I am trying to understand how the voxelization is performed inside the SECOND network. From the call history I noticed that it is performed inside
voxelize.py
in the forward method.The description of the method:
The function is called in
voxelnet.py
from mmdet3d.models.detectors folder. My question is: where do I set the numbermax_points
? Because in the function's arguments it is set to 35. But when I run the system and check the shapes of the output tensors, the max points value appears to be 5. Where do I set the max number of points per each voxel?