mit-han-lab / spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
http://spvnas.mit.edu/
MIT License
587 stars 109 forks source link

Error on eval code: feature size and kernel size mismatch #48

Closed martinrebane closed 3 years ago

martinrebane commented 3 years ago

Hi! An earlier version of evaluate.py on SemanticKITTI_val_SPVNAS@65GMACs worked fine when I tested it a few months back, but the latest code produces a weird error ValueError: Input feature size and kernel size mismatch caused by torchsparse (probably this line).

Using torchsparse 1.2, pytorch 1.7.1, cuda 11.0

Is it a torchsparse`spvnas` compatibility issue (e.g., models are trained on older version) or something else? I noticed that one earlier similar problem was solved by downgrading torchsparse, but the latest version of SPVNAS seems to be updated to use 1.2?

Thanks!

martin@pytorch18-vm:~/spvnas$ torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs
  File "/code/spvnas/model_zoo.py", line 50, in spvnas_specialized
    model = model.determinize()
  File "/code/spvnas/core/models/semantic_kitti/spvnas.py", line 311, in determinize
    x = self.forward(x)
  File "/code/spvnas/core/models/semantic_kitti/spvnas.py", line 343, in forward
    x1 = self.downsample[0](x1)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/modules.py", line 82, in forward
    x = self.layers[k](x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/layers.py", line 499, in forward
    out = self.relu(self.net(x) + self.downsample(x))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/layers.py", line 339, in forward
    out = self.net(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/dynamic_sparseop.py", line 98, in forward
    return spf.conv3d(inputs, cur_kernel, self.ks, self.s, self.d, self.t)
  File "/opt/conda/lib/python3.7/site-packages/torchsparse/nn/functional/conv.py", line 149, in conv3d
    idx_query[1], sizes, transpose)
  File "/opt/conda/lib/python3.7/site-packages/torchsparse/nn/functional/conv.py", line 40, in forward
    neighbor_offset, transpose)
ValueError: Input feature size and kernel size mismatch
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing                                                                           
the job to be terminated. The first process to do so was:

  Process name: [[37270,1],0]
  Exit code:    1
--------------------------------------------------------------------------
zhijian-liu commented 3 years ago

Thanks for reporting the issue! I will take a look shortly.

zhijian-liu commented 3 years ago

This should be fixed now.

martinrebane commented 3 years ago

Thank you for a quick response and fix! All is fine now!