Closed Jovendish closed 9 months ago
When applying a downsampling operation with a stride of 2, the coordinates are effectively halved. If you wish to maintain the original coordinate scale, you can easily achieve this by multiplying the coordinates by 2.
thanks for your reply.
In my code, I have three layers of downsampling operations. I have tried to scale the coordinates back to their original scale by multiplying them by two after each downsampling layer. However, I have noticed that this operation only works for the first downsampling layer, as the subsequent downsampling layers yield the same results. I'm unsure if I made a mistake in my implementation or if I have encountered some specific mechanism in torchsparse.
Hi @Jovendish , in the 2nd and 3rd layers, you are downsampling by stride=2 with the coordinates that has been multiplied by 2. Thus, the number of points will remain the same as the previous layers.
A potential solution might be:
def forward(self, x):
out_0 = self.stack_0(x)
out_1 = self.stack_1(out_0)
out_2 = self.stack_2(out_1)
out_0.C[:, 1:] *= 2
out_1.C[:, 1:] *= 4
out_2.C[:, 1:] *= 8
return [out_2, out_1, out_0]
Thank you very much for your patience. But actually, I want to be able to scale back in the middle of each layer, because I need to do some extra work in the middle of each layer, and I am wondering why torchsparsev2.1 changed the behavior of the downsampling layer,are there any considerations ?
You can follow @ys-2020 's approach to clone the coordinate tensor and do the scaling in the middle. We change this behavior to follow SpConv.
I'm using torchsparse's conv3d to do a downsampling operation with stride 2, but found that this operation not only reduces the size of the feature tensor, but also the coordinates, which is inconsistent with minkowski's performance. I was hoping to find a way to make torchsparse's conv3d's downsampling operation consistent with minkowski.
I checked the documentation of torchsparse but didn't find a relevant solution.
Is there any other parameter setting or custom operation method that can make torchsparse's conv3d downsampling operation consistent with minkowski? If you have any suggestions or guidance I would be very grateful.
Details
![WechatIMG10](https://github.com/mit-han-lab/torchsparse/assets/25397930/d321bdba-f61b-4ceb-a820-a1dfdab7566d) ![WechatIMG11](https://github.com/mit-han-lab/torchsparse/assets/25397930/d9d041e8-1903- 4308-847c-b3987b67c739)