Open Alkaiddd opened 6 months ago
Hi Alkaiddd, Thank you for your feedback! This model is not designed for real-time applications, and running inference with a 7B model does pose challenges, especially on less powerful GPUs. We have achieved inference times of 1-2 seconds per sample with batch inference on NVIDIA A100 GPUs. There's definitely room for improvement, such as quantization if you care about the inference speed.
The inference process is currently quite slow. Are there any methods available to accelerate it? For action task, it costs about 9s for a sample.