wudongming97 / TopoMLP

[ICLR2024] TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning
Apache License 2.0
131 stars 11 forks source link

Some issues with the code implementation #20

Closed cyty98 closed 2 months ago

cyty98 commented 2 months ago

I've been following this series of papers, they have been quite impressive! Regarding the implementation of TopoMLP code, I have several questions:

  1. I noticed that the lane detection head does not use the encoder module of DETR and only uses the second-last image feature map extracted by the backbone. fig1

In contrast, the traffic elements detection head utilizes the entire DETR architecture and all four layers of features. What caused this difference?

  1. In petr_transformer.py, line 89, there's a parameter named self.cross. From my observation, enabling this switch allows for attention interaction between images from different cameras. However, it seems that this functionality is not utilized during the default training process. Why is that? fig2

  2. In lane_head.py, line 300, there's a section of code for generating the distribution of depth-direction coordinates. The calculation involving self.position_range[3] - self.depth_start is a bit puzzling to me. It seems that position_range[3] is the maximum range of the x-axis in the BEV space. Why is the difference between the x-axis range in the BEV space and the depth range in the image frustum direction being calculated? fig3

Looking forward to your response. Thank you very much.

wudongming97 commented 2 months ago
  1. Our centerline detection follows PETR to use single-scale feature, while our traffic detection follows Deformable DETR to use multi-scale features.
  2. This setting is not working, so you can ignore it.
  3. Here we follow PETR setting to use the the maximum range of the x-axis in the BEV space to represent the maximum depth range. This is an approximate approach.
cyty98 commented 2 months ago

Thank you for your response.