The inference generation is very slow

wayveai / Driving-with-LLMs

PyTorch implementation for the paper "Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving"

Apache License 2.0

408 stars 38 forks source link

The inference generation is very slow #16

Open Alkaiddd opened 6 months ago

Alkaiddd commented 6 months ago

The inference process is currently quite slow. Are there any methods available to accelerate it? For action task, it costs about 9s for a sample.

melights commented 5 months ago

Hi Alkaiddd, Thank you for your feedback! This model is not designed for real-time applications, and running inference with a 7B model does pose challenges, especially on less powerful GPUs. We have achieved inference times of 1-2 seconds per sample with batch inference on NVIDIA A100 GPUs. There's definitely room for improvement, such as quantization if you care about the inference speed.