microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.83k stars 172 forks source link

Error: "Only able to place X replicas, but Y replicas were requested" #381

Open spring1915 opened 7 months ago

spring1915 commented 7 months ago

I ran

client = mii.serve("mistralai/Mistral-7B-Instruct-v0.2")
response = client.generate(inputs, max_new_tokens=128, tensor_parallel=2,  replica_num=2)

on AWS ml.g5.12xlarge with 4 GPUs on one instance. I got this error Only able to place 1 replicas, but 2 replicas were requested. A similar error (Only able to place 1 replicas, but 4 replicas were requested.) also occurred when I used client.generate(inputs, max_new_tokens=128, replica_num=4).

I used AWS DJL DeepSpeed to run, with this serving.properties file:

engine=DeepSpeed
option.entrypoint=model.py

model.py is a customized file, containing the code above and other simple scripts needed when using the DJL server.

mrwyattii commented 7 months ago

Hi @spring1915 the tensor_parallel and replica_num values should be passed to mii.serve. I've updated MII in #386 to error out when providing extra kwargs that we do not support to the generate method. Can you please update your code to the following and try again?

client = mii.serve("mistralai/Mistral-7B-Instruct-v0.2", tensor_parallel=2,  replica_num=2)
response = client.generate(inputs, max_new_tokens=128)
gangooteli commented 6 months ago

Getting below error :

python3 -m api_server
[2024-02-19 20:51:50,842] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
args.replica_num  1
[2024-02-19 20:51:51,516] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
[2024-02-19 20:51:51,516] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/api_server.py", line 192, in <module>
    mii.serve(args.model,
  File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 124, in serve
    import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
  File "/tmp/mii_cache/deepspeed-mii/score.py", line 33, in init
    mii.backend.MIIServer(mii_config)
  File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 44, in __init__
    mii_config.generate_replica_configs()
  File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 302, in generate_replica_configs
    replica_pool = _allocate_devices(self.hostfile,
  File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 350, in _allocate_devices
    raise ValueError(
ValueError: Only able to place 0 replicas, but 1 replicas were requested.

Is deepspeedmii not suitable for single gpu env A40

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A40                     On  | 00000000:00:07.0 Off |                    0 |
|  0%   53C    P8              23W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Please provide help on the same