Open DanielMing123 opened 1 year ago
We have not tried so small solution. However, I think it will perform poor since the 2D backbone cannot extract representative features from small images. If you have hardware limitation, you may remove skip connection and even substitute volume features as BEV features with 2D convs. The performance will decrease but should be better than using 320*180 resolution.
Hi, Dr Yi Wei,
Many thanks to your detail explanation and suggestion. If I use ResNet50 with input image size [800,450,3] (2 times down sample rate) and take 16 down-sample rate output as final feature map with 256 dim for feature channel. Does the model gonna work well under this setting? Very appreciate it if I can get some guidances from you.
Best regards, Zhenxing.
发件人: Yi Wei @.> 发送时间: 2023年8月16日 14:42 收件人: weiyithu/SurroundOcc @.> 抄送: DanielMing123 @.>; Author @.> 主题: Re: [weiyithu/SurroundOcc] Dose the model gonna work well with 5 times image down sample rate? (Issue #67)
We have not tried so small solution. However, I think it will perform poor since the 2D backbone cannot extract representative features from small images. If you have hardware limitation, you may remove skip connection and even substitute volume features as BEV features with 2D convs. The performance will decrease but should be better than using 320*180 resolution.
― Reply to this email directly, view it on GitHubhttps://github.com/weiyithu/SurroundOcc/issues/67#issuecomment-1680049224, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AY7VI6ZHHE5UF52PLOGIJKLXVRTWNANCNFSM6AAAAAA3PEPVTM. You are receiving this because you authored the thread.Message ID: @.***>
Hi, I think 2 times down sample rate maybe OK.
Hi, author, due to the hardware limitation, does the model can still perform well if I down-sample the image 5 times make it has 320*180 resolution? And is there a trick during training process to predict the empty voxel? I found that even with the dense 3D occupancy label, the empty voxel still dominate the class which make the model prefer to predict every place as empty.