opendilab / InterFuser

[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
Apache License 2.0
514 stars 42 forks source link

transformer #56

Open a1wj1 opened 12 months ago

a1wj1 commented 12 months ago

Hello, may I ask what is the input for the decoder of the transformer? What is the difference with the input of the encoder.

a1wj1 commented 12 months ago

Besides,will the collected bounding boxes filter out the ones outside the camera?

deepcs233 commented 12 months ago

Hi!

  1. The input of the encoder includes the image feature of the front/left/right/focusing view and the LiDAR feature. https://github.com/opendilab/InterFuser/blob/e4f0314482124bb06a475c3f6fb4bfe3a2701c4d/interfuser/timm/models/interfuser.py#L1024
  2. The input of the decoder includes query embeddings(waypoints, traffic sign, object density map) and the output of the decoder. https://github.com/opendilab/InterFuser/blob/e4f0314482124bb06a475c3f6fb4bfe3a2701c4d/interfuser/timm/models/interfuser.py#L1025
  3. By the way, you can refer to the pipeline picture in our paper, it may solve your questions like above.
  4. will the collected bounding boxes filter out the ones outside the camera? No, we consider all the objects within a certain distance of the ego-car.
a1wj1 commented 11 months ago

OK,In addition, the input of the encoder is information about the current frame, while the input of the decoder is information about future frames, right?

deepcs233 commented 11 months ago

the input of the encoder is information about the current frame

Yes

the input of the decoder is information about future frames

No, it mainly includes the information of current frame. In addition to the waypoints inlcudes some future prediction.

rockstarsir commented 11 months ago

Hi,

https://github.com/opendilab/InterFuser/blob/e4f0314482124bb06a475c3f6fb4bfe3a2701c4d/interfuser/timm/models/interfuser.py#L1037C46-L1037C47

Is there any significance for taking 401 to 411 from hs(decoder output). Is it like only these 10 features need to be taken or can i take starting features also?