open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.35k stars 1.55k forks source link

Question about voxelnet's voxelize module #625

Closed selmadeac closed 3 years ago

selmadeac commented 3 years ago

hi! I am trying to understand how the voxelization is performed inside the SECOND network. From the call history I noticed that it is performed inside voxelize.py in the forward method.

The description of the method:

"""convert kitti points(N, >=3) to voxels.

        Args:
            points: [N, ndim] float tensor. points[:, :3] contain xyz points
                and points[:, 3:] contain other information like reflectivity
            voxel_size: [3] list/tuple or array, float. xyz, indicate voxel
                size
            coors_range: [6] list/tuple or array, float. indicate voxel
                range. format: xyzxyz, minmax
            max_points: int. indicate maximum points contained in a voxel. if
                max_points=-1, it means using dynamic_voxelize
            max_voxels: int. indicate maximum voxels this function create.
                for second, 20000 is a good choice. Users should shuffle points
                before call this function because max_voxels may drop points.

        Returns:
            voxels: [M, max_points, ndim] float tensor. only contain points
                    and returned when max_points != -1.
            coordinates: [M, 3] int32 tensor, always returned.
            num_points_per_voxel: [M] int32 tensor. Only returned when
                max_points != -1.
        """

The function is called invoxelnet.pyfrom mmdet3d.models.detectors folder. My question is: where do I set the number max_points? Because in the function's arguments it is set to 35. But when I run the system and check the shapes of the output tensors, the max points value appears to be 5. Where do I set the max number of points per each voxel?

wHao-Wu commented 3 years ago

Hi, @selmadeac

The _Voxelization class will be called by Voxelization class in the same file here in voxelize.py. The max_points is set by max_num_points argument in Voxelization. So you can set the max_points by setting max_num_points in configs.

wHao-Wu commented 3 years ago

For example, the SECOND config in here will set the max_num_points to be 5. So you can see the shapes of the tensors to be 5.

selmadeac commented 3 years ago

Thank you very much for your answer! I understand now.

In the process of better understanding voxelization in mmdet3d framework I tried to implement a function which voxelizez the space in python using numpy arrays as basic structure. On a personal project I see that the voxels are built correctly, but I tried using my function instead of hard_voxelize function. I converted the tensors to numpy arrays, processed them and then changed them back to tensor. I managed to keep the required output dimensions and types. It appears to be working until I get a Segmentation fault in voxelnet.py, extract_feat(), at this line: x = self.middle_encoder(voxel_features, coors, batch_size).

The shapes and types of the output were the same as previous, why might this occur?

Wuziyi616 commented 3 years ago

Do you put them back to GPU? Do you convert the tensor to float?

selmadeac commented 3 years ago

Hi @Wuziyi616 !

What do you mean by putting them back to GPU? Is the conversion to tensors enough?

In the beginning I detach the points tensor and pass it into cpu and convert it to numpy in order to have access to the points:

points = points.cpu().detach().numpy()

And in the end I only pass my data structures into tensors:

 voxels_out = torch.from_numpy(voxels)
 coors_out = torch.from_numpy(coordinates)
 num_points_per_voxel_out = torch.from_numpy(num_points_per_voxel)

My data variables are originally declared as:

coordinates = coordinates.astype(np.int32)
num_points_per_voxel = np.zeros((M),dtype = np.int32)
voxels = np.zeros((M,max_points,3),dtype=float)

Is it not enough?

selmadeac commented 3 years ago

Thank you for your reply! I put them back to gpu calling .to('cuda:0') property after conversion to tensor, but it is still not working:


Traceback (most recent call last):
  File "./tools/train.py", line 222, in <module>
    main()
  File "./tools/train.py", line 218, in main
    meta=meta)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 247, in train_step
    losses = self(**data)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
    return old_func(*args, **kwargs)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/models/detectors/base.py", line 58, in forward
    return self.forward_train(**kwargs)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/models/detectors/voxelnet.py", line 124, in forward_train
    x = self.extract_feat(points, img_metas)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/models/detectors/voxelnet.py", line 45, in extract_feat
    x = self.middle_encoder(voxel_features, coors, batch_size)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
    return old_func(*args, **kwargs)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/models/middle_encoders/sparse_encoder.py", line 112, in forward
    x = self.conv_input(input_sp_tensor)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/modules.py", line 130, in forward
    input = module(input)
  File "/home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/conv.py", line 186, in forward
    outids.shape[0])
  File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/functional.py", line 65, in forward
    indice_pair_num, num_activate_out, False, True)
  File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/ops.py", line 119, in indice_conv
    int(subm))
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378098133/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f0d03e3b2f2 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f0d03e3867b in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f0d1c254219 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f0d03e233a4 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e6a3a (0x7f0c682b8a3a in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e6ae1 (0x7f0c682b8ae1 in /home/selma/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #26: __libc_start_main + 0xf3 (0x7f0d443bf0b3 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)
Wuziyi616 commented 3 years ago

Sorry for the late reply. Let me first make sure, do you use multi-gpu distributed training? If that's the case, you can't simply set cuda:0.

You need to first get the current device of this process by devide = points.device, then call tensor.to(device).

selmadeac commented 3 years ago

Thank you for your reply, @Wuziyi616 ! I did take the device name as you pointed and passed all data structures to tensors and then to gpu.

points =   torch.from_numpy(points2.copy())
voxels_out = torch.from_numpy(voxels.copy())
coors_out = torch.from_numpy(coordinates.copy())
num_points_per_voxel_out = torch.from_numpy(num_points_per_voxel.copy())

points = points.to(device)
voxels_out = voxels_out.to(device)
coors_out = coors_out.to(device)
num_points_per_voxel_out = num_points_per_voxel_out.to(device)

The dtype's for voxels_out,coors_out and num_points_per_voxel_outare torch.float32, torch.int32,and torch.int32.

But the error persists:


File "/home/selma/workspace/mmdetection3d/mmdet3d/ops/spconv/ops.py", line 119, in indice_conv
    int(subm))
RuntimeError: CUDA error: an illegal memory access was encountered

Do you eliminate voxels with number of points fewer than 1?

selmadeac commented 3 years ago

Solved! I used x y z voxel coords instead of z y x.