Only running one replica even though setting many replicas

import mii
import time

replica_num = 8

client = mii.serve("./Qwen1.5-0.5B-Chat", deployment_name='qwen', tensor_parallel=1, replica_num=replica_num)
while True:
    response = client.generate(["太阳与地球的距离是：", "月亮与地球的距离："]*20, max_new_tokens=128)
    print(response)

watch -n 1 nvidia-smi

I run this code on 8 GPUs machine and at any time, there is only one replica being running, and other replicas are free. Any solution?

microsoft / DeepSpeed-MII

Only running one replica even though setting many replicas #465