wayveai / Driving-with-LLMs

PyTorch implementation for the paper "Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving"
Apache License 2.0
453 stars 39 forks source link

banner

https://github.com/user-attachments/assets/82a1993e-5948-4f5a-ad9b-849a21fe9a14

This is the PyTorch implementation for inference and training of the LLM-Driver described in:

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton

ICRA 2024
[preprint] [arxiv]

LLM-Driver
The LLM-Driver utilises object-level vector input from our driving simulator to predict explanable actions using pretrained Language Models, providing a robust and interpretable solution for autonomous driving. LLM-Driver
The LLM-Driver running in open-loop prediction using the vector inputs (top-left BEV view), with the results of action prediction (steering angles and acceleration/brake pedals), action justification (captions on the rendered video), Driving Question Answering (table at the bottom).

News

Getting Started

Prerequisites

⚙ Setup

  1. Set up a virtual environment (tested with Python 3.8-3.11)

    python3 -m venv env
    source env/bin/activate
  2. Install required dependencies

    pip install -r requirements.txt.lock

Note: requirements.txt.lock is generated with pip-compile from original requirements.txt for reproducibility.

  1. Set up WandB API key

    Set up your WandB API key for training and evaluation logging.

    export WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

💿 Dataset

  1. Evaluate for Perception and Action Prediction

    Run the following command:

    python train.py \
        --mode eval \
        --resume_from_checkpoint models/weights/stage2_with_pretrained/ \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl \
        --eval_items caption,action \
        --vqa
  2. Evaluate for DrivingQA

    Run the following command:

    python train.py \
        --mode eval \
        --resume_from_checkpoint models/weights/stage2_with_pretrained/ \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl \
        --eval_items vqa \
        --vqa
  3. View Results

    The results can be viewed on the WandB project "llm-driver".

  4. Grade DrivingQA Results with GPT API

    To grade the results with GPT API, run the following command:

    python scripts/grade_vqa.py \
        -i data/vqa_test_1k.pkl \
        -o results/10k_ft.pkl \
        -r results/10k_ft.json \
        --openai_api xxxxxxxx

    Replace the results/10k_ft.json with the val_results.table.json downloaded from WandB to grade your results.

    🏊 Training

  5. Run LLM-Driver Training

    Execute the following command to start training:

    python train.py \
        --mode train \
        --eval_steps 50 \
        --val_set_size 32 \
        --num_epochs 5 \
        --resume_from_checkpoint models/weights/stage1_pretrained_model/ \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl \
        --vqa
  6. Follow the previous section for evaluating LLM-Driver

  7. [optional] Train and evaluate Perceiver-BC

    Execute the following command to start training and evaluation:

    python train_bc.py \
        --num_epochs 25 \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl

📝 Citation

If you find our work useful in your research, please consider citing:

@inproceedings{chen2024drivingwithllms,
  title={Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving},
  author={Long Chen and Oleg Sinavski and Jan Hünermann and Alice Karnsund and Andrew James Willmott and Danny Birch and Daniel Maund and Jamie Shotton},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2024}
}
@article{marcu2023lingoqa,
  title={LingoQA: Video Question Answering for Autonomous Driving}, 
  author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
  journal={arXiv preprint arXiv:2312.14115},
  year={2023},
}

🙌 Acknowledgements

This project has drawn inspiration from the Alpaca LoRA repository. We would like to express our appreciation for their contributions to the open-source community.