About how specifically LMDrive works

Thanks for the great work!

I am new to LLM-based ADS, and I have some questions about how LMDrive works. As stated in the paper, LMDrive is developed based on Q-former. And how Q-former works is as below:

As I took it, the key idea of LMDrive is to use the idea of Q-former for LLM-based driving. Efforts done in the paper is to training the Q-Former, merging multi-sensor data, and so on. Is this understanding correct?

Additionally, what is the specifically output of LMDrive? It seems to output future waypoints instead of direct control signals.

Many thanks for your attention!

opendilab / LMDrive

About how specifically LMDrive works #55