huixiancheng / CENet

[ICME 2022] CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving
MIT License
99 stars 13 forks source link

About resnet block #20

Closed lily128 closed 1 year ago

lily128 commented 1 year ago

Dear author,

Hello! Thanks for sharing the code!!

CENet is a very powerful and efficient network, and I would like to use it for my own dataset. However, my own dataset uses 128-channel lidars (unlike the 64-channel lidars used by semantic-kitti).

And also, I noticed that in both ResNet_34 and BasicBlock, there is a requirement for groups=1 and base_width=64, and that "'BasicBlock only supports groups=1 and base_width=64'".

So, I would like to ask, if I want to use CENet for 128-channel lidar data, can I just change base_width to 128 and keep groups as 1? Or, do you have any other suggestions?

Thanks a lot!

huixiancheng commented 1 year ago

In my opinion, there is no direct connection between number of channels of LiDAR and base_width. (Not mean channel=64 = base_width=64

Maybe you should first check the parameters in this section and the code in this section to achieve the correct projection. fov_up and fov_down should be change. height is 128 and width depends on the horizontal resolution of the LiDAR. https://github.com/huixiancheng/CENet/blob/d82d0f54e69dbcc485ca83044c7962b242dbe8bd/config/arch/senet-512.yml#L59-L66 https://github.com/huixiancheng/CENet/blob/d82d0f54e69dbcc485ca83044c7962b242dbe8bd/common/laserscan.py#L224-L229

However, considering that the image resolution obtained by a 128-channel LiDAR projection will be higher, it may be necessary boost the number of channels in the backbone to obtain a stronger feature extraction capability. Of course, it depends on whether you want a high efficiency or high performance model.

lily128 commented 1 year ago

In my opinion, there is no direct connection between number of channels of LiDAR and base_width. (Not mean channel=64 = base_width=64

Maybe you should first check the parameters in this section and the code in this section to achieve the correct projection. fov_up and fov_down should be change. height is 128 and width depends on the horizontal resolution of the LiDAR.

https://github.com/huixiancheng/CENet/blob/d82d0f54e69dbcc485ca83044c7962b242dbe8bd/config/arch/senet-512.yml#L59-L66

https://github.com/huixiancheng/CENet/blob/d82d0f54e69dbcc485ca83044c7962b242dbe8bd/common/laserscan.py#L224-L229

However, considering that the image resolution obtained by a 128-channel LiDAR projection will be higher, it may be necessary boost the number of channels in the backbone to obtain a stronger feature extraction capability. Of course, it depends on whether you want a high efficiency or high performance model.

Thanks for the advice! I'll try increasing the number of channels in the backbone (and maybe, try something else, like KD, to improve performance while keeping the original model inference speed). If I make any progress, I'll post it in this issue. Thanks again for your kind reply!!

lily128 commented 1 year ago

In my opinion, there is no direct connection between number of channels of LiDAR and base_width. (Not mean channel=64 = base_width=64

Maybe you should first check the parameters in this section and the code in this section to achieve the correct projection. fov_up and fov_down should be change. height is 128 and width depends on the horizontal resolution of the LiDAR.

https://github.com/huixiancheng/CENet/blob/d82d0f54e69dbcc485ca83044c7962b242dbe8bd/config/arch/senet-512.yml#L59-L66

https://github.com/huixiancheng/CENet/blob/d82d0f54e69dbcc485ca83044c7962b242dbe8bd/common/laserscan.py#L224-L229

However, considering that the image resolution obtained by a 128-channel LiDAR projection will be higher, it may be necessary boost the number of channels in the backbone to obtain a stronger feature extraction capability. Of course, it depends on whether you want a high efficiency or high performance model.

I'd still like to ask, what exactly are groups=1 and base_width=64 used for? It would be great if you could clear up that confusion for me. Thanks a lot!

huixiancheng commented 1 year ago

Refer to ResNet, it's use in the Bottleneck of ResNet to control the channel of each stage. However, since we only use BasicBlock, it's useless in the code.