Open zzf2grx opened 4 weeks ago
I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.
I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.
But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?
I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.
But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?
This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models.
I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.
But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?
This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models.
So is there any advice on how to improve the speed of image preprocessing?
So is there any advice on how to improve the speed of image preprocessing?
You can try passing smaller images to the model.
Proposal to improve performance
Hi~ I find the inference time of Qwen2-VL-7B AWQ is not improved too much compared to Qwen2-VL-7B. Do you have any suggestions about improving performance. Thank you!
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
Before submitting a new issue...