xinge008 / Cylinder3D

Rank 1st in the leaderboard of SemanticKITTI semantic segmentation (both single-scan and multi-scan) (Nov. 2020) (CVPR2021 Oral)
Apache License 2.0
859 stars 180 forks source link

Does Spconv work with CPU? #50

Closed FrancescoMandru closed 3 years ago

FrancescoMandru commented 3 years ago

I'm building a small network following your code which is the following:

for i_iter, (_, train_vox_label, train_grid, _, train_pt_fea) in enumerate(train_dataset_loader):

    train_pt_fea_ten = [torch.from_numpy(i).type(torch.FloatTensor).to(pytorch_device) for i in return_fea]
    # train_grid_ten = [torch.from_numpy(i[:,:2]).to(pytorch_device) for i in train_grid]
    train_vox_ten = [torch.from_numpy(i).to(pytorch_device) for i in grid_ind]
    point_label_tensor = torch.Tensor(processed_label).type(torch.LongTensor).to(pytorch_device)

    ret = spconv.SparseConvTensor(train_pt_fea_ten,
                                  train_vox_ten,
                                  grid_size.shape,
                                  1)

    model = TestModel(16, 32)

where TestModel is built as follows:

class TestModel(nn.Module):
    def __init__(self, in_filters, out_filters):
        super(TestModel, self).__init__()
        self.mod = spconv.SubMConv3d(in_filters, out_filters, kernel_size=(3, 3, 3))

    def forward(self, x):
        mod_out = self.mod(x)
        return mod_out

However when I try to test this code I get a strange error from spconv module:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/working_dir/mymodel/data_processing.py", line 176, in <module>
    out = model(ret)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/working_dir/mymodel/testmodel.py", line 11, in forward
    mod_out = self.mod(x)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/conv.py", line 129, in forward
    device = features.device
AttributeError: 'list' object has no attribute 'device'

Anyway I think that is related with the input data format. I need to built a network with only these modules, in particolar looking at the forward module function of spconv:

 def forward(self, input):
        assert isinstance(input, spconv.SparseConvTensor)
        features = input.features
        device = features.device
        indices = input.indices
        spatial_shape = input.spatial_shape
        batch_size = input.batch_size
xinge008 commented 3 years ago

From the readme of spconv (https://github.com/traveller59/spconv), it seems that spconv requires cuda library to perform the cuda hash implementation.

In fact, I do not have a try on CPU only.

FrancescoMandru commented 3 years ago

Hi @xinge008, I tought it was an option to enjoy the speed up the GPU but not a necessary requirement. Usually modules work also in CPU but they are much slower. To prove this you can use a simple example with a fake dataset and it perfectly works. So no, I don't think that Spconv works only in GPU mode, I think that the problem comes from the fact that we are passinga data structure which is not suitable for this module. train_pt_fea_ten and train_grid_ten are two list and for sure they don't have a device parameter.

My only aim is to understand in detail how to prepare the input data point cloud for spconv modules and I'm struggling on this thing but honestly I'm surrending to solve this task.

vaydingul commented 3 years ago

Hi @FrancescoMandru,

The whole codebase is actually able to work in CPU, but, as you imagine, it takes a tremendous amount of time to even pass one epoch (126 hours in my case 😄 ).

To be able to work in the CPU, you should install all the dependencies in CPU mode. In this case, torch and torch-scatter can be installed easily. On the other hand, spconv can also be installed for CPU; it automatically detects your system information and shapes its build procedure according to that.

Additionally, you need to specify your CPU as the main device in the code. To do that, you should change the following line:

pytorch_device = torch.device('cuda:0')

as it is:

pytorch_device = torch.device("cpu")

Also, in cylinder_fea_generator.py file, you should change the line:

shuffled_ind = torch.randperm(pt_num, device=cur_dev)

as it is:

shuffled_ind = torch.randperm(pt_num)

After all of this, you should be able to run the code in CPU.