tusen-ai / SST

Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral).
Apache License 2.0
801 stars 102 forks source link

Installation #71

Closed Dhagash4 closed 2 years ago

Dhagash4 commented 2 years ago

Hey! FIrst of all thanks for this amazing work and thanks a lot for providing open source code for the same. But I have query with installation. I have latest mm3ddetection running with pytorch 1.12.1 but if I try to install using the same versions of libraries library is not able to setup properly. Also the link to GETTING STARTED of mm3ddetection is expired. Besides that I have custom dataloader for my dataset, one question was if I want to use it for more than one class for example with nuscenes dataset is it possible to have a config for that?

Thanks a lot!

Abyssaledge commented 2 years ago
  1. Could you be more specific about the problem you occur during the installation?
  2. All of our configs support multi-class detection. I do not fully understand your difficulty about the multi-class setting.
Dhagash4 commented 2 years ago

Thanks for the fast response, at first it was the issue with THC/THC.h they are torch headers which were removed in latest versions so had to downgrade Pytorch, and then all the installations issues were over and I was able to install SST.

Now, since I am using my custom dataset, and I have modified the config, sst_base.py for multi-class detection in my case classes are 7, and modified https://github.com/tusen-ai/SST/blob/main/configs/sst/sst_waymoD5_1x_3class_8heads.py (the following file according to my needs).

I am trying to run the code with following command python tools/train.py configs/sst/${MY_CONFIG} and getting the following error.

KeyError: 'FocalLoss is already registered in models

My environment:

PyTorch=1.10.0+cu113
MMDetection=2.25.2
MMSegmentation=0.28.0
MMCV=1.4.0
MMdet3d=0.15.0 (installed using while running setup.py for SST repo)

Thanks a lot!

Abyssaledge commented 2 years ago

I guess this error is caused by you using the different version of MMDet or MMSeg or MMCV. The focal loss has already been registered in one of them. I fix the problem now by force registration https://github.com/tusen-ai/SST/blob/main/mmdet3d/models/losses/focal_loss.py#L135

Dhagash4 commented 2 years ago

Thanks for the response and it fixed my issue, I am working on purely 3d data (custom dataset) which contains points (Nx7) and bboxs(Nx7[theta,cx,cy,cz,l,w,h]), the model training is working on SECOND network config so the dataloader and setup is working properly. As I am using my custom dataset I modified this according to my dataset and modified POINT_CLOUD_RANGE=[0, -40, -2.5, 80, 40, 2.5] and tried running the network, which resulted in following error

error log_file

If I am changing point_cloud_range what else do I need to change?

Thanks a lot for response!

Abyssaledge commented 2 years ago

This error is caused by mismatch between. weight shape and data shape. Specifically, according to (959x13 and 10x64) in the error information, I guess the your input point cloud feature dimension is different with ours (Nx5). You could easily fix this issue by modifying the MLP channels in SIR or changing the input channels.

Dhagash4 commented 2 years ago

Thanks a lot for all the help, I will look into all the hyperparameters in detail but just one last question my input is quite sparse so there is this warning getting displayed

No voxel belongs to drop_level:2 in shift 0 and it has something to do with drop_info_train, can you tell me briefly like what it is for and does it impact training at all?

Abyssaledge commented 2 years ago

No voxel belongs to drop_level:2 in shift 0 is just a hint, which does not affect the performance. It's okey to mute it or comment the print line. It seems that you are running the older version SST. I recommend you use the configs in sst_refactor

Dhagash4 commented 2 years ago

@Abyssaledge I changes the dataset point cloud range and config as following

_base_ = [
    '../_base_/models/sst_base.py',
    '../_base_/datasets/lh5_dataset.py',
    '../_base_/schedules/cosine_2x.py',
    '../_base_/default_runtime.py',
]

voxel_size = (0.32, 0.32, 5)
window_shape=(12, 12, 1) # 12 * 0.32m
point_cloud_range = [0, -40, -2.5, 80, 40, 2.5]
drop_info_training ={
    0:{'max_tokens':30, 'drop_range':(0, 30)},
    1:{'max_tokens':60, 'drop_range':(30, 60)},
    2:{'max_tokens':100, 'drop_range':(60, 100000)},
}
drop_info_test ={
    0:{'max_tokens':30, 'drop_range':(0, 30)},
    1:{'max_tokens':60, 'drop_range':(30, 60)},
    2:{'max_tokens':100, 'drop_range':(60, 100)},
    3:{'max_tokens':144, 'drop_range':(100, 100000)},
}
drop_info = (drop_info_training, drop_info_test)
shifts_list=[(0, 0), (window_shape[0]//2, window_shape[1]//2) ]

model = dict(
    type='DynamicVoxelNet',

    voxel_layer=dict(
        voxel_size=voxel_size,
        max_num_points=-1,
        point_cloud_range=point_cloud_range,
        max_voxels=(-1, -1)
    ),

    voxel_encoder=dict(
        type='DynamicVFE',
        in_channels=7,
        feat_channels=[64, 128],
        with_distance=False,
        voxel_size=voxel_size,
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=point_cloud_range,
        norm_cfg=dict(type='naiveSyncBN1d', eps=1e-3, momentum=0.01)
    ),
    bbox_head=dict(
        type='Anchor3DHead',
        num_classes=3,
        in_channels=384,
        feat_channels=384,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='AlignedAnchor3DRangeGenerator',
            ranges=[[0, -40, 0.0, 80, 40, 1.5],
                    [0, -40, 0.0, 80, 40, 2.0],
                    [0, -40, 0.0, 80, 40, .15]],
            sizes=[
                [3.9, 1.6, 1.56],  # car
                [12.0, 2.85, 4.0],  # truck
                [0.8, 0.6, 1.73]  # pedestrian
            ],
            rotations=[0, 1.57],
            reshape_out=False),
        diff_rad_by_sin=True,
        dir_offset=0.7854,  # pi/4
        dir_limit_offset=0,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=0.5),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)
    ),

    middle_encoder=dict(
        type='SSTInputLayerV2',
        window_shape=window_shape,
        sparse_shape=(468, 468, 1),
        shuffle_voxels=True,
        debug=True,
        drop_info=drop_info,
        pos_temperature=10000,
        normalize_pos=False,
    ),

    backbone=dict(
        type='SSTv2',
        d_model=[128,] * 6,
        nhead=[8, ] * 6,
        num_blocks=6,
        dim_feedforward=[256, ] * 6,
        output_shape=[468, 468],
        num_attached_conv=3,
        conv_kwargs=[
            dict(kernel_size=3, dilation=1, padding=1, stride=1),
            dict(kernel_size=3, dilation=1, padding=1, stride=1),
            dict(kernel_size=3, dilation=2, padding=2, stride=1),
        ],
        conv_in_channel=128,
        conv_out_channel=128,
        debug=True,
    ),

)

# runtime settings

fp16 = dict(loss_scale=32.0)

so basically changed the bbox_head as well but if I change that my loss is going to nan after 11 epochs, do you maybe know what can be possible reason for that?

Abyssaledge commented 2 years ago
  1. It's hard to say what leads to the NaN, while I suggest you disable fp16 training.
  2. I find another issue in your config, you change the point cloud range but forget modifying sparse_shape in middle_encoder and output_shape in backbone.