microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.35k stars 4.1k forks source link

[BUG] Inference example outdated #1667

Closed drunkcoding closed 2 years ago

drunkcoding commented 2 years ago

Under the official tutorials with deepspeed 0.5.8

# Filename: gpt-neo-2.7b-generation.py
import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B', device=local_rank)

generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           dtype=torch.float,
                                           replace_method='auto')

string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(string)

The following command report error

deepspeed --num_gpus 2 gpt-neo-2.7b-generation.py

The error goes away when adding replace_with_kernel_inject=True

Traceback (most recent call last):
  File "tests/ds_generator.py", line 62, in <module>
    generator.model = deepspeed.init_inference(generator.model,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/__init__.py", line 274, in init_inference
    engine = InferenceEngine(model,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 86, in __init__
    self._apply_injection_policy(
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 161, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 464, in replace_transformer_layer
    return replace_module(model=model,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_module
    replaced_module, _ = _replace_module(model, policy)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 583, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 583, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 578, in _replace_module
    policies[child.__class__][0](child,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 460, in replace_fn
    new_module = replace_wo_policy(child, _policy)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 443, in replace_wo_policy
    return _replace_module(module)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 440, in _replace_module
    _replace_module(child, name)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 440, in _replace_module
    _replace_module(child, name)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 435, in _replace_module
    linear_policies[child.__class__](child,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 344, in _replace
    if name in all_reduce_linears:
TypeError: argument of type 'ABCMeta' is not iterable
Traceback (most recent call last):
  File "tests/ds_generator.py", line 62, in <module>
    generator.model = deepspeed.init_inference(generator.model,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/__init__.py", line 274, in init_inference
    engine = InferenceEngine(model,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 86, in __init__
    self._apply_injection_policy(
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 161, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 464, in replace_transformer_layer
    return replace_module(model=model,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_module
    replaced_module, _ = _replace_module(model, policy)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 583, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 583, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 578, in _replace_module
    policies[child.__class__][0](child,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 460, in replace_fn
    new_module = replace_wo_policy(child, _policy)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 443, in replace_wo_policy
    return _replace_module(module)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 440, in _replace_module
    _replace_module(child, name)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 440, in _replace_module
    _replace_module(child, name)
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 435, in _replace_module
    linear_policies[child.__class__](child,
  File "/home/jupyter-xue/.conda/envs/torch/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 344, in _replace
    if name in all_reduce_linears:
TypeError: argument of type 'ABCMeta' is not iterable

Can we have more detail information on the combination of inference engine arguments?

RezaYazdaniAminabadi commented 2 years ago

Hi @drunkcoding

Thanks for mentioning this issue. Yes, you are right about adding that flag for this model to run this model. I will add more description and fix this issue in a PR. Best, Reza