isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.82k stars 313 forks source link

Cuda Error 700 #548

Closed antoniosilva9116 closed 2 years ago

antoniosilva9116 commented 2 years ago

Checklist

Describe the issue

I'm trying to implement SparseConvUNet for the PartA2 model similar or practically the same code as can be found in https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/models/backbones_3d/spconv_unet.py. For some reason I'm getting an error in this line:

        def __init__(....)
            (...)
            self.conv_input = spconv.SparseSequential(
            SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'),
            norm_fn(16),
            nn.ReLU(),
            )

        def forward(...)  
            (...)
            input_sp_tensor = spconv.SparseConvTensor(
                features=voxel_features,
                indices=[voxel_coords.int](http://voxel_coords.int/)(),
                spatial_shape=self.sparse_shape,
                batch_size=batch_size
            )

            x = self.conv_input(input_sp_tensor) # **this line**

Running on the original OpenPCDet it works fine in my implementation I'm getting the following error:

- [Exception|implicit_gemm]feat=torch.Size([16000, 4]),w=torch.Size([16, 3, 3, 3, 4]),pair=torch.Size([27, 16000]),act=16000,issubm=True,istrain=True
- RuntimeError: /io/build/temp.linux-x86_64-3.9/spconv/build/src/cumm/gemm/main/GemmMainUnitTest/GemmMainUnitTest_stream_synchronize.cc(11)
- CUDA error 700

The shape of voxel coords, voxel features and input sparse tensor are the following:

I'm using batch size 1 to test if this value causes this. The shape of before mentioned parameters is the same as OpenPCDet. Can you help me solve this problem? I made the same question in spconv repo but no answer until now (2/3 weeks)!

Steps to reproduce the bug

The parta2_kitti.yml to reproduce the error:

dataset:
  name: KITTI
  dataset_path: # path/to/your/dataset
  cache_dir: ./logs/cache
  steps_per_epoch_train:

model:
  name: SECONDHeadSingle
  ckpt_path: # path/to/your/checkpoint

  batcher: "ignore"

  point_cloud_range: [ 0, -40, -3, 70.4, 40, 1 ]
  classes: [ 'Pedestrian', 'Cyclist', 'Car' ]

  voxelize:
    max_num_points: 5
    voxel_size: &vsize
      [ 0.05, 0.05, 0.1 ]
    max_voxels: [ 16000, 40000 ]

  voxel_encoder:
    name: MeanVoxelFeatureNet
    in_channels: 4

  spunetmiddle:
    grid_size:
    voxel_size: [ 0.05, 0.05, 0.1 ]
    return_encoded_tensor: True
    input_channels: 4

SECONDVoxelization:

import torch
from torch import nn
from torch.nn.modules.utils import _pair
import numpy as np

from open3d.ml.torch.ops import voxelize, ragged_to_dense

class SECONDVoxelization(nn.Module):

    def __init__(self,
                 voxel_size,
                 point_cloud_range,
                 max_num_points=32,
                 max_voxels=[16000, 40000]):
        """Voxelization layer for the SECOND model.

        Args:
            voxel_size: voxel edge lengths with format [x, y, z].
            point_cloud_range: The valid range of point coordinates as
                [x_min, y_min, z_min, x_max, y_max, z_max].
            max_num_points: The maximum number of points per voxel.
            max_voxels: The maximum number of voxels. May be a tuple with
                values for training and testing.
        """
        super().__init__()
        self.voxel_size = torch.Tensor(voxel_size)
        self.point_cloud_range = point_cloud_range
        self.points_range_min = torch.Tensor(point_cloud_range[:3])
        self.points_range_max = torch.Tensor(point_cloud_range[3:])
        self.max_num_points = max_num_points
        if isinstance(max_voxels, tuple) or isinstance(max_voxels, list):
            self.max_voxels = max_voxels
        else:
            self.max_voxels = _pair(max_voxels)

        point_cloud_range_np = np.array(point_cloud_range, dtype=np.float32)
        point_cloud_distance = point_cloud_range_np[3:]-point_cloud_range_np[:3]
        logging.info(f'\n\npoint cloud distance {point_cloud_distance}')
        logging.info(f'atual grid size {point_cloud_distance / voxel_size}')

        grid_size = self.calculate_grid_size()

        self.grid_size = grid_size

    def calculate_grid_size(self,):
        point_cloud_range = np.array(self.point_cloud_range, dtype=np.float32)
        grid_size = (point_cloud_range[3:6] - point_cloud_range[0:3]) / np.array(self.voxel_size)
        grid_size = np.round(grid_size).astype(np.int64)

        logging.info(f'calculate_grid_size function: {grid_size}')

        return grid_size

    def forward(self, points_feats):
        """Forward function.

        Args:
            points_feats: Tensor with point coordinates and features. The shape
                is [N, 4+C] with N as the number of points and C as the number
                of feature channels.

        Returns:
            (out_voxels, out_coords, out_num_points).
            * out_voxels is a dense list of point coordinates and features for
              each voxel. The shape is [num_voxels, max_num_points, 3+C].
            * out_coords is tensor with the integer voxel coords and shape
              [num_voxels,3]. Note that the order of dims is [z,y,x].
            * out_num_points is a 1D tensor with the number of points for each
              voxel.
        """
        if self.training:
            max_voxels = self.max_voxels['train']
        else:
            max_voxels = self.max_voxels['test']

        points = points_feats[:, :3]

        logging.info(f'points shape: {points.shape}')

        num_voxels = ((self.points_range_max - self.points_range_min) /
                      self.voxel_size).type(torch.int64)

        ans = voxelize(points,
                       torch.LongTensor([0, points.shape[0]]).to(points.device),
                       self.voxel_size, self.points_range_min,
                       self.points_range_max, self.max_num_points, max_voxels)

        # prepend row with zeros which maps to index 0 which maps to void points.
        feats = torch.cat(
            [torch.zeros_like(points_feats[0:1, :]), points_feats])

        # create dense matrix of indices. index 0 maps to the zero vector.
        voxels_point_indices_dense = ragged_to_dense(
            ans.voxel_point_indices, ans.voxel_point_row_splits,
            self.max_num_points, torch.tensor(-1)) + 1

        out_voxels = feats[voxels_point_indices_dense]
        out_coords = ans.voxel_coords[:, [2, 1, 0]].contiguous()
        out_num_points = ans.voxel_point_row_splits[
                         1:] - ans.voxel_point_row_splits[:-1]

        # Filter out pillars generated out of bounds of the pseudoimage.
        in_bounds_y = out_coords[:, 1] < num_voxels[1]
        in_bounds_x = out_coords[:, 2] < num_voxels[0]
        in_bounds = torch.logical_and(in_bounds_x, in_bounds_y)

        out_coords = out_coords[in_bounds]
        out_voxels = out_voxels[in_bounds]
        out_num_points = out_num_points[in_bounds]

        return out_voxels, out_coords, out_num_points

MeanVoxelFeatureNet:

import logging

import torch
from torch import nn

class MeanVoxelFeatureNet(nn.Module):
    """
    Mean Voxel Feature Net for PartA2. Keep x, y, z, r
    """

    def __init__(self,
                 in_channels=4,
                 name='MeanVoxelFeatureNet'
                 ):
        super(MeanVoxelFeatureNet, self).__init__()

        self.in_channels = in_channels

    def get_output_feature_dim(self):
        return self.in_channels

    def forward(self, features, voxel_num_points):
        """
        Args:
            features (torch.Tensor): (num_voxels, max_points_per_voxel, C)
            voxel_num_points: optional (num_voxels)

        Returns:
            vfe_features: (num_voxels, C)
        """
        points_mean = features[:, :, :].sum(dim=1, keepdim=False)
        normalizer = torch.clamp_min(voxel_num_points.view(-1, 1), min=1.0).type_as(features)
        points_mean = points_mean / normalizer
        voxel_features = points_mean.contiguous()

        return voxel_features

SparseConvUnet class as follows:

import logging
from functools import partial

import torch
import torch.nn as nn

from ml3d.torch.utils.spconv_utils import replace_feature, spconv
from ml3d.torch.utils.torch_utils import get_voxel_centers

def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,
                   conv_type='subm', norm_fn=None):
    if conv_type == 'subm':
        conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)
    elif conv_type == 'spconv':
        conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
                                   bias=False, indice_key=indice_key)
    elif conv_type == 'inverseconv':
        conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)
    else:
        raise NotImplementedError

    m = spconv.SparseSequential(
        conv,
        norm_fn(out_channels),
        nn.ReLU(),
    )

    return m

class SparseBasicBlock(spconv.SparseModule):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, indice_key=None, norm_fn=None):
        super(SparseBasicBlock, self).__init__()
        self.conv1 = spconv.SubMConv3d(
            inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False, indice_key=indice_key
        )
        self.bn1 = norm_fn(planes)
        self.relu = nn.ReLU()
        self.conv2 = spconv.SubMConv3d(
            planes, planes, kernel_size=3, stride=1, padding=1, bias=False, indice_key=indice_key
        )
        self.bn2 = norm_fn(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x.features

        assert x.features.dim() == 2, 'x.features.dim()=%d' % x.features.dim()

        out = self.conv1(x)
        out = replace_feature(out, self.bn1(out.features))
        out = replace_feature(out, self.relu(out.features))

        out = self.conv2(out)
        out = replace_feature(out, self.bn2(out.features))

        if self.downsample is not None:
            identity = self.downsample(x)

        out = replace_feature(out, out.features + identity)
        out = replace_feature(out, self.relu(out.features))

        return out

class SparseConvUNet(nn.Module):
    """
    Sparse Convolution based UNet for point-wise feature learning.
    Reference Paper: https://arxiv.org/abs/1907.03670 (Shaoshuai Shi, et. al)
    From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network
    """

    def __init__(self, input_channels, grid_size, voxel_size, point_cloud_range, return_encoded_tensor=False):
        super(SparseConvUNet, self).__init__()
        self.return_encoded_tensor = return_encoded_tensor
        self.sparse_shape = grid_size[::-1] + [1, 0, 0]
        self.voxel_size = voxel_size
        self.point_cloud_range = point_cloud_range

        logging.info(f'input_channels: {input_channels}, grid_size: {grid_size}, voxel_size: {voxel_size}, point_cloud_range {point_cloud_range}')

        norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)

        self.conv_input = spconv.SparseSequential(
            spconv.SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'),
            norm_fn(16),
            nn.ReLU(),
        )

        block = post_act_block

        self.conv1 = spconv.SparseSequential(
            block(16, 16, 3, norm_fn=norm_fn, padding=1, indice_key='subm1'),
        )

        self.conv2 = spconv.SparseSequential(
            # [1600, 1408, 41] <- [800, 704, 21]
            block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),
            block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),
            block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),
        )

        self.conv3 = spconv.SparseSequential(
            # [800, 704, 21] <- [400, 352, 11]
            block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),
            block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),
            block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),
        )

        self.conv4 = spconv.SparseSequential(
            # [400, 352, 11] <- [200, 176, 5]
            block(64, 64, 3, norm_fn=norm_fn, stride=2, padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'),
            block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),
            block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),
        )

        if return_encoded_tensor:
            logging.info(f'return_encoded_tensor: {return_encoded_tensor}')
            last_pad = 0 if voxel_size[-1] in [0.1, 0.2] else (1, 0, 0)

            self.conv_out = spconv.SparseSequential(
                # [200, 150, 5] -> [200, 150, 2]
                spconv.SparseConv3d(64, 128, (3, 1, 1), stride=(2, 1, 1), padding=last_pad,
                                    bias=False, indice_key='spconv_down2'),
                norm_fn(128),
                nn.ReLU(),
            )
        else:
            logging.info(f'return_encoded_tensor: False')
            self.conv_out = None

        # decoder
        # [400, 352, 11] <- [200, 176, 5]
        self.conv_up_t4 = SparseBasicBlock(64, 64, indice_key='subm4', norm_fn=norm_fn)
        self.conv_up_m4 = block(128, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4')
        self.inv_conv4 = block(64, 64, 3, norm_fn=norm_fn, indice_key='spconv4', conv_type='inverseconv')

        # [800, 704, 21] <- [400, 352, 11]
        self.conv_up_t3 = SparseBasicBlock(64, 64, indice_key='subm3', norm_fn=norm_fn)
        self.conv_up_m3 = block(128, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3')
        self.inv_conv3 = block(64, 32, 3, norm_fn=norm_fn, indice_key='spconv3', conv_type='inverseconv')

        # [1600, 1408, 41] <- [800, 704, 21]
        self.conv_up_t2 = SparseBasicBlock(32, 32, indice_key='subm2', norm_fn=norm_fn)
        self.conv_up_m2 = block(64, 32, 3, norm_fn=norm_fn, indice_key='subm2')
        self.inv_conv2 = block(32, 16, 3, norm_fn=norm_fn, indice_key='spconv2', conv_type='inverseconv')

        # [1600, 1408, 41] <- [1600, 1408, 41]
        self.conv_up_t1 = SparseBasicBlock(16, 16, indice_key='subm1', norm_fn=norm_fn)
        self.conv_up_m1 = block(32, 16, 3, norm_fn=norm_fn, indice_key='subm1')

        self.conv5 = spconv.SparseSequential(
            block(16, 16, 3, norm_fn=norm_fn, padding=1, indice_key='subm1')
        )
        self.num_point_features = 16

    def UR_block_forward(self, x_lateral, x_bottom, conv_t, conv_m, conv_inv):
        x_trans = conv_t(x_lateral)
        x = x_trans
        x = replace_feature(x, torch.cat((x_bottom.features, x_trans.features), dim=1))
        x_m = conv_m(x)
        x = self.channel_reduction(x, x_m.features.shape[1])
        x = replace_feature(x, x_m.features + x.features)
        x = conv_inv(x)
        return x

    @staticmethod
    def channel_reduction(x, out_channels):
        """
        Args:
            x: x.features (N, C1)
            out_channels: C2
        Returns:
        """
        features = x.features
        n, in_channels = features.shape
        assert (in_channels % out_channels == 0) and (in_channels >= out_channels)

        x = replace_feature(x, features.view(n, out_channels, -1).sum(dim=2))
        return x

    def forward(self, voxel_features, voxel_coords, batch_size):
        """
        Args:
            batch_size: int
            voxel_features: (num_voxels, C)
            voxel_coords: (num_voxels, 4), [batch_idx, z_idx, y_idx, x_idx]
        Returns:
            encoded_spconv_tensor: sparse tensor
            point_features: (N, C)
        """
        logging.info(f'voxel_features shape: {voxel_features.shape}, type: {type(voxel_features)} \n{voxel_features}')
        logging.info(f'voxel_coords shape: {voxel_coords.shape}, type: {type(voxel_coords)} \n{voxel_coords}')
        logging.info(f'self.sparse_shape: {self.sparse_shape}')

        voxel_coords_int = voxel_coords.int() 
        logging.info(f'voxel_coords_int shape: {voxel_coords_int.shape}, type: {type(voxel_coords_int)} \n{voxel_coords_int}')

        input_sp_tensor = spconv.SparseConvTensor(
            features=voxel_features,
            indices=voxel_coords_int,
            spatial_shape=self.sparse_shape,
            batch_size=int(batch_size)
        )

        logging.info(f'input_sp_tensor shape: {input_sp_tensor.spatial_shape}')
        logging.info(f'input_sp_tensor shape: {input_sp_tensor}')
        logging.info(f'input_sp_tensor features {input_sp_tensor.features.shape}: {input_sp_tensor.features}')

        x = self.conv_input(input_sp_tensor) # ERROR HERE

        x_conv1 = self.conv1(x)
        x_conv2 = self.conv2(x_conv1)
        x_conv3 = self.conv3(x_conv2)
        x_conv4 = self.conv4(x_conv3)

        if self.conv_out is not None:
            # for detection head
            # [200, 176, 5] -> [200, 176, 2]
            out = self.conv_out(x_conv4)
            encoded_spconv_tensor = out
            encoded_spconv_tensor_stride = 8

        # for segmentation head
        # [400, 352, 11] <- [200, 176, 5]
        x_up4 = self.UR_block_forward(x_conv4, x_conv4, self.conv_up_t4, self.conv_up_m4, self.inv_conv4)
        # [800, 704, 21] <- [400, 352, 11]
        x_up3 = self.UR_block_forward(x_conv3, x_up4, self.conv_up_t3, self.conv_up_m3, self.inv_conv3)
        # [1600, 1408, 41] <- [800, 704, 21]
        x_up2 = self.UR_block_forward(x_conv2, x_up3, self.conv_up_t2, self.conv_up_m2, self.inv_conv2)
        # [1600, 1408, 41] <- [1600, 1408, 41]
        x_up1 = self.UR_block_forward(x_conv1, x_up2, self.conv_up_t1, self.conv_up_m1, self.conv5)

        point_features = x_up1.features
        point_coords = get_voxel_centers(
            x_up1.indices[:, 1:], downsample_times=1, voxel_size=self.voxel_size,
            point_cloud_range=self.point_cloud_range
        )
        point_coords = torch.cat((x_up1.indices[:, 0:1].float(), point_coords), dim=1)

        # map to bev
        spatial_features = encoded_spconv_tensor.dense()
        N, C, D, H, W = spatial_features.shape
        spatial_features = spatial_features.view(N, C * D, H, W)

        return point_features, point_coords, spatial_features

PartA2 class is as follows:

class PartA2(BaseModel):
    def __init__(self,
                 name="PartA2",
                 device="cuda",
                 point_cloud_range=[0, -40.0, -3, 70.0, 40.0, 1],
                 classes=['car'],
                 voxelize={},
                 voxel_encoder={},
                 spunetmiddle={},
                 **kwargs):
        super().__init__(name=name,
                         point_cloud_range=point_cloud_range,
                         device=device,
                         **kwargs)
        self.point_cloud_range = point_cloud_range
        self.classes = classes

        self.name2lbl = {n: i for i, n in enumerate(classes)}
        self.lbl2name = {i: n for i, n in enumerate(classes)}

        self.augmenter = ObjdetAugmentation(self.cfg.augment, seed=self.rng)

        self.post_process_cfg = post-processing

         self.voxel_layer = SECONDVoxelization(
             point_cloud_range=point_cloud_range, **voxelize) 

        self.voxel_encoder = MeanVoxelFeatureNet(**voxel_encoder)

        spunetmiddle['grid_size'] = self.voxel_processor.calculate_grid_size()
        self.middle_encoder_3d = SparseConvUNet(
            point_cloud_range=np.array(point_cloud_range), **spunetmiddle
        )

        self.device = device

    def extract_feats(self, points):
        """Extract features from points."""
        voxels, num_points, coors = self.voxelize(points)

        voxel_features = self.voxel_encoder(voxels, num_points)

        batch_size = coors[-1, 0].item() + 1

        x = self.middle_encoder_3d(
            voxel_features, coors, batch_size
        )

        return x, batch_size

### Error message

I'm getting the following error:

Expected behavior

No response

Open3D, Python and System information

- Operating system: Ubuntu 20.04
- Python version: Python 3.9.12
- Open3D version: 0.15.2 `print(open3d.__version__)`)
- System type: x64 
- Is this remote workstation?: no
- How did you install Open3D?: pip with conda, pytorch installed via requirements-torch-cuda.txt
- Compiler version (if built from source): gcc 9.4.0

Additional information

No response

sanskar107 commented 2 years ago

@antoniosilva9116 The error seems to be inside 3rd party package spconv. You can try to do following to trace the problem.

  1. Does it crashes on cpu as well?
  2. Does it always happen on the first iteration or some specific iteration ? or is it random ?
  3. You can try to save the inputs to this line x = self.middle_encoder_3d(...), and try to create a very small example which reproduce this issue. You may find that there is some problem with the data itself.
antoniosilva9116 commented 2 years ago

Hi @sanskar107. Thank you for your answer. I'm trying to compare the input data of both sides (PCDet and yours) but I figured out that your code does not split the KITTI dataset based on the KITTI protocol (like the files attached). Am I wrong?

train.txt val.txt test.txt

So, answering your questions:

  1. It seems that most spconv code is optimised to GPU. I got an error using CPU (passed --device cpu in run_pipeline.py):

    assert indices.is_cuda, "implicit gemm only support cuda"
    AssertionError: implicit gemm only supports cuda
  2. It happens in the first frame_id: 000000

  3. I Attached the data passed to the middle extractor. I made the following changes: in parta2_kitti.yml

log_train_2022-06-23_18:57:31.txt

  voxelize:
    max_num_points: 5
    voxel_size: &vsize
      [ 0.05, 0.05, 0.1 ]
    max_voxels: {
      'train': 16000,
      'test': 40000
    }

Then, I used your voxelized function as implemented in PointPillars.py.

In conclusion, I to understand if the error is related to the data passed to the Middle Extractor, I implemented Second with MeanVoxelFeatureNet and the SECONDVoxelization.py and voxelized function that you use in PointPillars. I implemented the Sparse Middle Extractor as defined in their traveller repo that uses the same code that in the case of SparseConvUNet provide to me with the error mentioned in this issue, the code of Second Sparse Middle Extractor is as follows:

def __init__(self,
                 grid_size,
                 in_channels=4,
                 name='SparseMiddleExtractor'):
        (...)
        self.middle_conv = spconv.SparseSequential(
            # Block 1
            SubMConv3d(in_channels, 16, 3, indice_key="subm0"),
            nn.BatchNorm1d(16, eps=1e-3, momentum=0.01),
            nn.ReLU(),
            SubMConv3d(16, 16, 3, indice_key="subm0"),
            nn.BatchNorm1d(16, eps=1e-3, momentum=0.01),
            nn.ReLU(),
            SparseConv3d(16, 32, 3, 2,
                         padding=1),  # [1600, 1200, 41] -> [800, 600, 21]
            nn.BatchNorm1d(32, eps=1e-3, momentum=0.01),
            nn.ReLU(),
            (...)
        )
def forward(self, voxel_features, coors, batch_size)
        coors = coors.int()

        ret = spconv.SparseConvTensor(
            features=voxel_features,
            indices=coors,
            spatial_shape=self.sparse_shape,
            batch_size=batch_size
        )

        ret = self.middle_conv(ret)

Sorry but in this issue, I misplaced the line where the error happens. It happens in SparseConvUnet class in this line:

x = self.conv_input(input_sp_tensor) # ERROR HERE