The problem of inconsistent effects between torchsparse's conv3d downsampling and minkowski's

Jovendish commented 1 year ago

I'm using torchsparse's conv3d to do a downsampling operation with stride 2, but found that this operation not only reduces the size of the feature tensor, but also the coordinates, which is inconsistent with minkowski's performance. I was hoping to find a way to make torchsparse's conv3d's downsampling operation consistent with minkowski.

I checked the documentation of torchsparse but didn't find a relevant solution.

Is there any other parameter setting or custom operation method that can make torchsparse's conv3d downsampling operation consistent with minkowski? If you have any suggestions or guidance I would be very grateful.

Details

![WechatIMG10](https://github.com/mit-han-lab/torchsparse/assets/25397930/d321bdba-f61b-4ceb-a820-a1dfdab7566d) ![WechatIMG11](https://github.com/mit-han-lab/torchsparse/assets/25397930/d9d041e8-1903- 4308-847c-b3987b67c739)

zhijian-liu commented 1 year ago

When applying a downsampling operation with a stride of 2, the coordinates are effectively halved. If you wish to maintain the original coordinate scale, you can easily achieve this by multiplying the coordinates by 2.

Jovendish commented 1 year ago

thanks for your reply.

In my code, I have three layers of downsampling operations. I have tried to scale the coordinates back to their original scale by multiplying them by two after each downsampling layer. However, I have noticed that this operation only works for the first downsampling layer, as the subsequent downsampling layers yield the same results. I'm unsure if I made a mistake in my implementation or if I have encountered some specific mechanism in torchsparse.

Partial code

` F.set_conv_mode(2) F.set_kmap_mode('hashmap') F.set_downsample_mode('minkowski') class Encoder(torch.nn.Module): def __init__(self, channels=[1, 16, 32, 64, 32, 8]): super().__init__() self.stack_0 = nn.Sequential( spnn.Conv3d(channels[0], channels[1], 3, 1, bias=True), spnn.ReLU(inplace=True), spnn.Conv3d(channels[1], channels[2], 2, 2, bias=True), # DownScale spnn.ReLU(inplace=True), ) self.stack_1 = nn.Sequential( spnn.Conv3d(channels[2], channels[2], 3, 1, bias=True), spnn.ReLU(inplace=True), spnn.Conv3d(channels[2], channels[3], 2, 2, bias=True), # DownScale spnn.ReLU(inplace=True), ) self.stack_2 = nn.Sequential( spnn.Conv3d(channels[3], channels[3], 3, 1, bias=True), spnn.ReLU(inplace=True), spnn.Conv3d(channels[3], channels[4], 2, 2, bias=True), # DownScale spnn.ReLU(inplace=True), ) def forward(self, x): out_0 = self.stack_0(x) out_0.C[:, 1:] *= 2 out_1 = self.stack_1(out_0) out_1.C[:, 1:] *= 2 out_2 = self.stack_2(out_1) out_2.C[:, 1:] *= 2 return [out_2, out_1, out_0] `

Result

![WechatIMG21893](https://github.com/mit-han-lab/torchsparse/assets/25397930/f4b0652c-7c48-45ba-ba1b-d0ddf9aa26f7) ![WechatIMG21894](https://github.com/mit-han-lab/torchsparse/assets/25397930/12c57c03-5fb4-499e-a8ba-c804fec93a52) ![WechatIMG21895](https://github.com/mit-han-lab/torchsparse/assets/25397930/af022bdb-d37a-4df6-a853-ca14432f7c47) ![WechatIMG21896](https://github.com/mit-han-lab/torchsparse/assets/25397930/5ea5c33f-8f65-44c6-9463-0e7fc43e0cb6)

ys-2020 commented 12 months ago

Hi @Jovendish , in the 2nd and 3rd layers, you are downsampling by stride=2 with the coordinates that has been multiplied by 2. Thus, the number of points will remain the same as the previous layers.

A potential solution might be:

def forward(self, x):
        out_0 = self.stack_0(x)
        out_1 = self.stack_1(out_0)
        out_2 = self.stack_2(out_1)

        out_0.C[:, 1:] *= 2
        out_1.C[:, 1:] *= 4
        out_2.C[:, 1:] *= 8

        return [out_2, out_1, out_0]

Jovendish commented 12 months ago

Thank you very much for your patience. But actually, I want to be able to scale back in the middle of each layer, because I need to do some extra work in the middle of each layer, and I am wondering why torchsparsev2.1 changed the behavior of the downsampling layer，are there any considerations ?

zhijian-liu commented 11 months ago

You can follow @ys-2020 's approach to clone the coordinate tensor and do the scaling in the middle. We change this behavior to follow SpConv.

mit-han-lab / torchsparse

The problem of inconsistent effects between torchsparse's conv3d downsampling and minkowski's #242