Open spring1915 opened 7 months ago
Hi @spring1915 the tensor_parallel
and replica_num
values should be passed to mii.serve
. I've updated MII in #386 to error out when providing extra kwargs that we do not support to the generate
method. Can you please update your code to the following and try again?
client = mii.serve("mistralai/Mistral-7B-Instruct-v0.2", tensor_parallel=2, replica_num=2)
response = client.generate(inputs, max_new_tokens=128)
Getting below error :
python3 -m api_server
[2024-02-19 20:51:50,842] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
args.replica_num 1
[2024-02-19 20:51:51,516] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
[2024-02-19 20:51:51,516] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/api_server.py", line 192, in <module>
mii.serve(args.model,
File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 124, in serve
import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
File "/tmp/mii_cache/deepspeed-mii/score.py", line 33, in init
mii.backend.MIIServer(mii_config)
File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 44, in __init__
mii_config.generate_replica_configs()
File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 302, in generate_replica_configs
replica_pool = _allocate_devices(self.hostfile,
File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 350, in _allocate_devices
raise ValueError(
ValueError: Only able to place 0 replicas, but 1 replicas were requested.
Is deepspeedmii not suitable for single gpu env A40
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A40 On | 00000000:00:07.0 Off | 0 |
| 0% 53C P8 23W / 300W | 4MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Please provide help on the same
I ran
on AWS ml.g5.12xlarge with 4 GPUs on one instance. I got this error
Only able to place 1 replicas, but 2 replicas were requested
. A similar error (Only able to place 1 replicas, but 4 replicas were requested
.) also occurred when I usedclient.generate(inputs, max_new_tokens=128, replica_num=4)
.I used AWS DJL DeepSpeed to run, with this serving.properties file:
model.py
is a customized file, containing the code above and other simple scripts needed when using the DJL server.