Hi @melfm ,
Thank you for the share, it is really a great work.
Here is my question.
The feature extractor output channel numbers are both 32 for the BEV and image, and after the fusion, the channel is still 32. (Is that right?)
However, the channel numbers of FC layers are 512, 1024 and 1024 for classification, offset, and orientation regression, and it means that the number of categories is 16 and the offset and orientation are both encoded as 32 channels. (Is that right?)
In fact, those three number in you thesis are set as 2, 10 and 2 for categories, offset and orientation.
Hi @melfm , Thank you for the share, it is really a great work.
Here is my question. The feature extractor output channel numbers are both 32 for the BEV and image, and after the fusion, the channel is still 32. (Is that right?) However, the channel numbers of FC layers are 512, 1024 and 1024 for classification, offset, and orientation regression, and it means that the number of categories is 16 and the offset and orientation are both encoded as 32 channels. (Is that right?) In fact, those three number in you thesis are set as 2, 10 and 2 for categories, offset and orientation.
So, it is really confused me.