wayveai/Driving-with-LLMs

https://github.com/user-attachments/assets/82a1993e-5948-4f5a-ad9b-849a21fe9a14

This is the PyTorch implementation for inference and training of the LLM-Driver described in:

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton

ICRA 2024
[preprint] [arxiv]

The LLM-Driver utilises object-level vector input from our driving simulator to predict explanable actions using pretrained Language Models, providing a robust and interpretable solution for autonomous driving.
The LLM-Driver running in open-loop prediction using the vector inputs (top-left BEV view), with the results of action prediction (steering angles and acceleration/brake pedals), action justification (captions on the rendered video), Driving Question Answering (table at the bottom).

News

[2024/01/29] Thrilled to share that our paper has been accepted by ICRA 2024!
[2023/12/21] Please checkout our follow-up work LingoQA: [code] [arxiv]
[2023/10/03] The paper is now avaliable on [arxiv]
[2023/07/06] The paper and code have been made available under the paper_code branch for anonymous submission.

Getting Started

Prerequisites

Python 3.x
pip
Minimum of 20GB VRAM for running evaluations
Minimum of 40GB VRAM for training (default setting)

⚙ Setup

Set up a virtual environment (tested with Python 3.8-3.11)
```
python3 -m venv env
source env/bin/activate
```
Install required dependencies
```
pip install -r requirements.txt.lock
```

Note: requirements.txt.lock is generated with pip-compile from original requirements.txt for reproducibility.

Set up WandB API key

Set up your WandB API key for training and evaluation logging.
```
export WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

💿 Dataset

Training/testing data: The datasets have already been checked into the codebase. To unarchive them, use the following commands:
```
tar -xzvf data/vqa_train_10k.tar.gz -C data/
tar -xzvf data/vqa_test_1k.tar.gz -C data/
```
Re-collect DrivingQA data: While the training and evaluation datasets already include pre-collected DrivingQA data, we also offer a script that illustrates how to collect DrivingQA data using the OpenAI ChatGPT API. If you wish to re-collect the DrivingQA data, simplely run the following command with your OpenAI API key:
```
python scripts/collect_vqa.py -i data/vqa_test_1k.pkl -o output_folder/ --openai_api xxxxxxxx
```
🏄 Evaluation

Evaluate for Perception and Action Prediction

Run the following command:

python train.py \
    --mode eval \
    --resume_from_checkpoint models/weights/stage2_with_pretrained/ \
    --data_path data/vqa_train_10k.pkl \
    --val_data_path data/vqa_test_1k.pkl \
    --eval_items caption,action \
    --vqa

Evaluate for DrivingQA

Run the following command:

python train.py \
    --mode eval \
    --resume_from_checkpoint models/weights/stage2_with_pretrained/ \
    --data_path data/vqa_train_10k.pkl \
    --val_data_path data/vqa_test_1k.pkl \
    --eval_items vqa \
    --vqa

View Results

The results can be viewed on the WandB project "llm-driver".
Grade DrivingQA Results with GPT API

To grade the results with GPT API, run the following command:
```
python scripts/grade_vqa.py \
    -i data/vqa_test_1k.pkl \
    -o results/10k_ft.pkl \
    -r results/10k_ft.json \
    --openai_api xxxxxxxx
```
Replace the results/10k_ft.json with the val_results.table.json downloaded from WandB to grade your results.

🏊 Training

Run LLM-Driver Training

Execute the following command to start training:

python train.py \
    --mode train \
    --eval_steps 50 \
    --val_set_size 32 \
    --num_epochs 5 \
    --resume_from_checkpoint models/weights/stage1_pretrained_model/ \
    --data_path data/vqa_train_10k.pkl \
    --val_data_path data/vqa_test_1k.pkl \
    --vqa

Follow the previous section for evaluating LLM-Driver

[optional] Train and evaluate Perceiver-BC

Execute the following command to start training and evaluation:

python train_bc.py \
    --num_epochs 25 \
    --data_path data/vqa_train_10k.pkl \
    --val_data_path data/vqa_test_1k.pkl

📝 Citation

If you find our work useful in your research, please consider citing:

@inproceedings{chen2024drivingwithllms,
  title={Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving},
  author={Long Chen and Oleg Sinavski and Jan Hünermann and Alice Karnsund and Andrew James Willmott and Danny Birch and Daniel Maund and Jamie Shotton},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2024}
}

@article{marcu2023lingoqa,
  title={LingoQA: Video Question Answering for Autonomous Driving}, 
  author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
  journal={arXiv preprint arXiv:2312.14115},
  year={2023},
}

🙌 Acknowledgements

This project has drawn inspiration from the Alpaca LoRA repository. We would like to express our appreciation for their contributions to the open-source community.