Hi, I tested for mii and vllm on A100 device for Yi-6B model, it seems that vllm (5.12s/query) is faster than mii (6.08s/query), is there any config that i need to set?
Here is my setting
input len = 1536
output len = 512
batch size = 1
test set size: 100
warmup stage is not considered into the time cost statistics.
Hi, I tested for mii and vllm on A100 device for Yi-6B model, it seems that vllm (5.12s/query) is faster than mii (6.08s/query), is there any config that i need to set?
Here is my setting
The model loader is as follows.