mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
https://torchsparse.mit.edu
MIT License
1.23k stars 143 forks source link

[BUG] Unexpected behaviour of strided convolution #282

Closed angshine closed 10 months ago

angshine commented 11 months ago

Is there an existing issue for this?

Current Behavior

According to #14, I expect TorchSparse to behave like SparseConvolution when stride > 1. This seems to not be the case when using (stride=2, kernel_size=2, padding=0).

The following code snippet creates 32^3 volumes with xz planes at various y positions, and the behaviour of Conv3d(stride=2, kernel_size=2, padding=0) varies with different y positions, while two other Conv3d(stride=2) variants behave as expected:

import torch
import torchsparse
from torchsparse import nn as spnn

def build_sparse_plane(x = None, y = 0, z = None, size = 32):
    _cs = []
    for c in [x, y, z]:
        if c is None:
            c = torch.arange(0, size, dtype = torch.int)
        else:
            c = torch.full((size,), c, dtype = torch.int)
        _cs.append(c)
    coords = torch.stack([c.flatten() for c in torch.meshgrid(*_cs)], dim=-1)
    coords = torch.cat([torch.zeros(32*32*32, 1, dtype=torch.int), coords], dim=-1)
    sparse_tensor = torchsparse.SparseTensor(
        coords=coords, feats=torch.ones(int(size**3), 1)
    )
    return sparse_tensor

sparse_conv_2x2_p0 = spnn.Conv3d(1, 2, kernel_size=2, stride=2, padding=0).cuda()
sparse_conv_2x2_p1 = spnn.Conv3d(1, 2, kernel_size=2, stride=2, padding=1).cuda()
sparse_conv_3x3 = spnn.Conv3d(1, 2, kernel_size=3, stride=2, padding=1).cuda()

for y in range(0, 8):
    st_xz_yi = build_sparse_plane(y=y).cuda()
    st_out_2x2_p0 = sparse_conv_2x2_p0(st_xz_yi)
    st_xz_yi = build_sparse_plane(y=y).cuda()
    st_out_2x2_p1 = sparse_conv_2x2_p1(st_xz_yi)
    st_xz_yi = build_sparse_plane(y=y).cuda()
    st_out_3x3 = sparse_conv_3x3(st_xz_yi)
    print(f"y={y}, 2x2(p0): {st_out_2x2_p0.C.shape}, 2x2(p1): {st_out_2x2_p1.C.shape} 3x3: {st_out_3x3.C.shape}")

The output is:

y=0, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=1, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=2, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=3, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=4, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=5, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=6, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=7, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])

Moreover, the cache created by applying sparse_conv_2x2_p0 affects the output of sparse_conv_2x2_p1. In the following snippet, I reuse st_xz_yi:

for y in range(0, 8):
    st_xz_yi = build_sparse_plane(y=y).cuda()
    st_out_2x2_p0 = sparse_conv_2x2_p0(st_xz_yi)
    # reuse st_xz_yi affects the output of sparse_conv_2x2_p1
    st_out_2x2_p1 = sparse_conv_2x2_p1(st_xz_yi)
    st_out_3x3 = sparse_conv_3x3(st_xz_yi)
    print(f"y={y}, 2x2(p0): {st_out_2x2_p0.C.shape}, 2x2(p1): {st_out_2x2_p1.C.shape} 3x3: {st_out_3x3.C.shape}")

The output is:

y=0, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([0, 4]) 3x3: torch.Size([256, 4])
y=1, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([256, 4]) 3x3: torch.Size([256, 4])
y=2, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([0, 4]) 3x3: torch.Size([256, 4])
y=3, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([256, 4]) 3x3: torch.Size([256, 4])
y=4, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([0, 4]) 3x3: torch.Size([256, 4])
y=5, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([256, 4]) 3x3: torch.Size([256, 4])
y=6, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([0, 4]) 3x3: torch.Size([256, 4])
y=7, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([256, 4]) 3x3: torch.Size([256, 4])

Expected Behavior

Conv3d(stride=2, kernel_size=2, padding=0) should behave like other Conv3d(stride=2) variants.

Am i misusing TorchSparse in some way or misunderstanding some aspect of it?

Environment

- GCC: 9.4.0
- NVCC: release 11.7, V11.7.99
- PyTorch: 1.13.1+cu117
- PyTorch CUDA: 11.7
- TorchSparse: 2.1.0+torch113cu117

Anything else?

No response

ys-2020 commented 10 months ago

Hi @angshine , I would suggest to set kmap_mode='hashmap' in order to get the outputs you expected.

For the second question, when you are reusing the st_xz_yi, TorchSparse will reuse the kernel_maps for the downsampling since both kernel_size and stride are the same, and thus the outputs are the same. This is not a common behavior we expect. Therefore, you may need to rebuild st_xz_yi if you are going to run this two layers correctly.

However, I would like to mention that the results you got above is actually correct according to the definition of sparse convolution.

y=0, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=1, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=2, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=3, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=4, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=5, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=6, 2x2(p0): torch.Size([0, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])
y=7, 2x2(p0): torch.Size([256, 4]), 2x2(p1): torch.Size([289, 4]) 3x3: torch.Size([256, 4])

For example, when you set y=0, all the points will have the y-axis coordinate equals to 0. Thus, the spatial shape of y dimension is 1. If you are applying a sparse convolution with kernel_size=2 and padding=0, it is anticipated that there will be no output points. (However, if you manually set the spatial_range of SparseTensor when doing the initialization, the outputs will be the same as your expectation.)

angshine commented 10 months ago

Thanks! Setting kmap_mode='hashmap' leads to correct results.

Regarding the correctness of Conv3d(kernel_size=2, padding=0, stride=1), I'm unsure about your point. My understanding is that the outcome should remain consistent irrespective of the Y coordinate in the XZ plane. Alternatively, if this behavior is intentional, should it not be consistent across different kmap_mode settings?

214 might be related.

ys-2020 commented 10 months ago

@angshine No. Since you did not set the spatial_range when initializing the SparseTensor, TorchSparse will automatically infer the feature map shape. Therefore, for example, if all points have coordinates with y=0, then the spatial shape of input SparseTensor will be [batch_size, x_shape, 1, z_shape]. Therefore, if you are doing Conv3d(kernel_size=2,padding=0,stride=2), since the conv kernel (size=2) cannot be fit into the spatial range of input tensor (y-dim shape = 1, which is smaller than kernel_size =2), there will be no output points. It is similar that with y=2,4,..., we cannot get output points unless you manually set the spatial_range when initializing the SparseTensor.

ys-2020 commented 10 months ago

The difference between two kmap_mode is not intentional. Actually it is not recommended to initialized SparseTensor without setting spatial_range. Also, your test case is not very common in sparse convolution, since all the points are on the edge (from the perspective of y-axis) of the input feature map.

I hope this can resolve your questions.

angshine commented 10 months ago

Thank you for your detailed response! Now my concerns have been addressed. However, it might be beneficial to provide a warning or include documentation on these "edge cases" with Conv3D, otherwise it might just "fail" saliently. Regarding setting the spatial_range, I would be apprciated if it is mentioned in the examples :)

ys-2020 commented 10 months ago

Thank you for the advice! We will make corresponding updates.