mit-han-lab / spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
http://spvnas.mit.edu/
MIT License
587 stars 109 forks source link

SPVCNN vs MinkowskiNet at same value of `cr` #28

Closed chaitjo closed 3 years ago

chaitjo commented 3 years ago

Hi, thank you for the insightful work! I had a (potentially dumb) question regarding the comparison of MinkowskiNet and SPVCNN (without the NAS): I see that you provide cr parameter to control the channel ratio in order to control the width of the networks. Am I correct in my understanding that, for this Figure, when comparing MinkowskiNet and SPVCNN under same MACs, the cr for both models are different?

zhijian-liu commented 3 years ago

They are the same as the computation overhead introduced by the point-based is not that significant. The overhead can also be observed in this figure actually (there is a small shift in the x-axis between MinkowskiNet and SPVCNN).

chaitjo commented 3 years ago

Thanks for the response. I see, indeed, I can also see the small shift!

  1. Based on Haotien's response here: #19, I got the impression that voxelization/devoxelization procedure in SPVCNN will have some impact on MACs and GPU latency when compared to MinkowskiNet at the same cr? (B/c the sparse convolution branches are exactly the same in both nets.) Am I correct in my understanding that the penalty for voxelization/devoxelization procedure actually does not penalize the model's inference time much, esp. at small cr?

  2. This raises another Q. to me: at the same cr, e.g. cr=1.0, shouldn't the number of trainable parameters in MinkowskiNet be lower than the corresponding SPVCNN? (B/c SPVCNN uses point transformation MLPs whereas MinkowskiNet does not.) However, in the pre-trained models you released, both models seem to be equal in number of parameters; am I missing something?

zhijian-liu commented 3 years ago
  1. I think your understanding is correct. The voxelization and devoxelization will indeed introduce some overhead. However, as we are doing them once per several layers, the overhead can be amortized.
  2. You are right. From https://github.com/mit-han-lab/e3d/blob/master/spvnas/core/models/semantic_kitti/minkunet.py#L149-L165, you may see that we have also defined the point transforms for MinkowskiNet although we are not actually using them. We will remove them to resolve the confusion.
chaitjo commented 3 years ago

I see, that clarifies it. Thanks!