microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.91k stars 175 forks source link

run more then 4 tensor_parallel error #179

Closed ychy00001 closed 1 year ago

ychy00001 commented 1 year ago

I try to deepspeed local mode, download huggingface bigscience/bloomz-7b1-mt set tensor_parallel=4 run success, but set tensor_parallel 5、6、7、8,it’s doesn't work

mii_configs = {
    "dtype": "fp16",
    "tensor_parallel": 8,
    "load_with_sys_mem": True,
    "port_number": 50950,
    "skip_model_check": True,
    "deploy_rank":[0,1,2,3,4,5,6,7]
}
model_path = "/data/model/bloomz-7b1-mt"
mii.deploy(task='text-generation',
           model=model_path,
           deployment_name=name + "_deployment",
           ds_config=ds_config,
           mii_config=mii_configs)
[2023-04-28 11:44:09,797] [ERROR] [launch.py:434:sigkill_handler] ['/data/project/DeepSpeed-MII/venv/bin/python', '-m',
 'mii.launch.multi_gpu_server', '--deployment-name', 'bloomz-7b1-mt_deployment', '--task-name', 'text-generation', 
'--model', '/data/model/bloomz-7b1-mt', '--model-path', '/tmp/mii_models', '--port', '50950', '--ds-optimize', '--provider', 
'hugging-face', '--config', 'eyJ0ZW5zb.........'] exits with return code = -7
[2023-04-28 11:44:09,892] [INFO] [server.py:82:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
  File "/data/project/DeepSpeed-MII/examples/local/text-generation-bloom-example.py", line 48, in <module>
    mii.deploy(task='text-generation',
  File "/data/project/DeepSpeed-MII/mii/deployment.py", line 142, in deploy
    return _deploy_local(deployment_name, model_path=model_path)
  File "/data/project/DeepSpeed-MII/mii/deployment.py", line 148, in _deploy_local
    mii.utils.import_score_file(deployment_name).init()
  File "/tmp/mii_cache/bloomz-7b1-mt_deployment/score.py", line 27, in init
    mii.MIIServer(deployment_name,
  File "/data/project/DeepSpeed-MII/mii/server.py", line 67, in __init__
    self._wait_until_server_is_live(processes, deployment)
  File "/data/project/DeepSpeed-MII/mii/server.py", line 79, in _wait_until_server_is_live
    raise RuntimeError(
RuntimeError: server crashed for some reason, unable to proceed
ychy00001 commented 1 year ago

Seems to be a problem with the deepspeed framework,I'll close it LINK: https://github.com/microsoft/DeepSpeed/issues/2897