microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.85k stars 174 forks source link

Performance with vllm #467

Open littletomatodonkey opened 5 months ago

littletomatodonkey commented 5 months ago

Hi, I tested for mii and vllm on A100 device for Yi-6B model, it seems that vllm (5.12s/query) is faster than mii (6.08s/query), is there any config that i need to set?

Here is my setting

The model loader is as follows.

    model_path = "/mnt/bn/multimodel/models/official/Yi-6B-Chat/"
    pipe = mii.pipeline(model_path, torch_dist_port=12345)

    resp = pipe([prompt], min_new_tokens=512, max_new_tokens=512)
awan-10 commented 2 months ago

@littletomatodonkey - mii.pipeline is just for a quick start so performance may not be optimal.

For better performance, please try the mii.serve API to create a persistent deployment.