Open AlleyOop23 opened 8 months ago
Hi! Thank you for your attention to our project! I have attached our previous latency analysis. The experiments were conducted on a 3090 platform. With 2 A800 GPUs and a quantization tech, I think the average latency can be reduced to < 0.1 seconds, and the variance is small. The large language model wouldn't bring a big time cost, as one frame only provides 4 tokens for the LLM. The main latency comes from the vision encoder. You can apply some existing optimization tricks (like flash attention) of LLM and ResNet to reduce the time cost.
Thank you for your early reply! It seems that there is a promising opportunity to deploy this work onto real vehicles. If I aim to implement this work for achieving autonomous driving on a simple road segment in a real environment, could you provide me with some insights? I'm interested in understanding the process of translating theory into practice, as well as the key steps and technical challenges to consider.
Thank you very much for your time and assistance!
Hi,Thanks for your great work! I would like to inquire, based on your experience and research, do you believe it is feasible to deploy this work onto real vehicles? Particularly, considering our current computational resources are 2 A800 GPUs (80G). In this scenario, how long do you think it might take us to achieve this goal? Are there any key technical challenges or issues that need to be addressed?