open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.3k stars 1.54k forks source link

RuntimeError: expected scalar type Half but found Float #965

Closed zehuichen123 closed 2 years ago

zehuichen123 commented 3 years ago

Hi, I am trying to use fp16 with CenterPoint but end up with the same bug :( The environment is mmdet3d 0.16.0, cuda10.1, torch1.6, V100. I think the weights of sparse conv are not converted into fp16 since the code run into the first if clause in line 119 mmdet3d/ops/spconv/ops.py:

def indice_conv(features,
                filters,
                indice_pairs,
                indice_pair_num,
                num_activate_out,
                inverse=False,
                subm=False):
    if filters.dtype == torch.float32:
        return sparse_conv_ext.indice_conv_fp32(features, filters,
                                                indice_pairs, indice_pair_num,
                                                num_activate_out, int(inverse),   
                                                int(subm))         #### The error is thrown from here    
    elif filters.dtype == torch.half:
        return sparse_conv_ext.indice_conv_half(features, filters,
                                                indice_pairs, indice_pair_num,
                                                num_activate_out, int(inverse),
                                                int(subm))
    else:
        raise NotImplementedError

ps. I directly add fp16 = dict(loss_scale=512.) in CenterPoint default config to enable float16 training.

Tai-Wang commented 3 years ago

We have not supported mixed precision training for CenterPoint, so you can only refer to the implementation of PointPillars and SECOND to adjust some details of CenterPoint for now.

zehuichen123 commented 2 years ago

@Tai-Wang Hi, I've rechecked the code in mmdet3d and the SECOND doesn't support fp16, although there does exist an example config (https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py). I think it's a bug here.

zehuichen123 commented 2 years ago

I found the direct reason may related to the self.weight of the SparseConvolution(ops/spconv/conv.py). It won't be converted to fp16 during mixed-precision training. But currently i have no idea how to fix it :)

zehuichen123 commented 2 years ago

I will create a new issue to report this problem with detailed experimental settings.