microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.4k stars 4.11k forks source link

Blocking issue when using deepspeed inference(maybe mutex or nccl issue) #1265

Closed switiz closed 1 year ago

switiz commented 3 years ago

Description

Dear Deepspeed

I have some issue when using model parallel(inference engine) sometimes gpu utilization is fixed to 100% and code is hanged so i made test code and test deepspeed engine. here's my test code.

TestCode

import os
import deepspeed
import torch
import transformers
from transformers import pipeline, AutoTokenizer

def init():
    local_rank = int(os.getenv('LOCAL_RANK', '0'))
    world_size = int(os.getenv('WORLD_SIZE', '1'))
    generator = pipeline(
        'text-generation', model='EleutherAI/gpt-neo-2.7B', device=local_rank)
    generator.model = deepspeed.init_inference(generator.model,
                                            mp_size=world_size,
                                            dtype=torch.float,
                                            replace_method='auto')
    return generator

def predict(text, max_len):
    top_k = 50
    temperature = 1.0
    top_p = 1.0
    return_seq = 1
    string = generator(text, do_sample=True, min_length=50, max_length=max_len, top_k=top_k, temperature=temperature, top_p=top_p, num_return_sequences=return_seq,
                            pad_token_id=3)
    if torch.distributed.get_rank() == 0:
        print(string)

if __name__ == '__main__':
    generator = init()
    text = 'a'
    seq = 2023
    for i in range(2, seq):
        print(f'##### max_len: {i}')
        predict(text, i)

DS_Report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.8/site-packages/torch']
torch version .................... 1.9.0a0+c3d40fd
torch cuda version ............... 11.3
nvcc version ..................... 11.3
deepspeed install path ........... ['/opt/conda/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.4.3, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.9, cuda 11.3

ENV

issue is occur when input length is reached to 90 token. (may be it's randomly determined)

thank you

hyunwoongko commented 3 years ago

Did you specify a shared memory size to make it work in docker?

hyunwoongko commented 3 years ago

Please show the error message together.

switiz commented 3 years ago

Dear @hyunwoongko

  1. It works even on 64mb. but docker default shared memory size is too small so I increased the memory enough for safe operation. i change shared memory size to 16g but still reproduce.

shm 16G 0 16G 0% /dev/shm

  1. Need to turn on special logs for issues? Not printed Error log just hang... Currently, only general logs are attached. If you need to turn on a special log, please let me know and I will test it.

  2. I change docker image from 21.06 to 21.07 (it include latest nccl version) issue is still reproduce.


GPU Utilization

[0] NVIDIA A100-SXM4-40GB | 42'C, 100 % | 15490 / 40536 MB | [1] NVIDIA A100-SXM4-40GB | 39'C, 100 % | 15490 / 40536 MB |

RUN LOG

[2021-08-03 01:47:32,928] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2021-08-03 01:47:36,323] [INFO] [runner.py:360:main] cmd = /opt/conda/bin/python3.8 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 gpt_neo_deepspeed.py [2021-08-03 01:47:37,184] [INFO] [launch.py:73:main] 0 NCCL_VERSION 2.9.9 [2021-08-03 01:47:37,184] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1]} [2021-08-03 01:47:37,184] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=2, node_rank=0 [2021-08-03 01:47:37,184] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]}) [2021-08-03 01:47:37,184] [INFO] [launch.py:102:main] dist_world_size=2 [2021-08-03 01:47:37,184] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1 [2021-08-03 01:52:32,666] [INFO] [logging.py:68:log_dist] [Rank -1] DeepSpeed info: version=0.4.2, git-hash=unknown, git-branch=unknown [2021-08-03 01:52:32,666] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl [2021-08-03 01:52:35,183] [INFO] [logging.py:68:log_dist] [Rank -1] DeepSpeed info: version=0.4.2, git-hash=unknown, git-branch=unknown [2021-08-03 01:52:35,184] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl Using /root/.cache/torch_extensions as PyTorch extensions root... Using /root/.cache/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Loading extension module transformer_inference... Time to load transformer_inference op: 0.21926426887512207 seconds Time to load transformer_inference op: 0.21379852294921875 seconds DeepSpeed Transformer Inference config is DeepSpeed Transformer Inference config is {'layer_id': 0, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} {'layer_id': 0, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is DeepSpeed Transformer Inference config is {'layer_id': 1, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256}{'layer_id': 1, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256}

DeepSpeed Transformer Inference config is {'layer_id': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 3, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 3, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 4, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 4, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 5, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 5, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 6, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 6, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 7, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 7, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 8, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 8, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 9, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 9, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is DeepSpeed Transformer Inference config is {'layer_id': 10, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256}{'layer_id': 10, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256}

DeepSpeed Transformer Inference config is {'layer_id': 11, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 11, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 12, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 12, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 13, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 13, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 14, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 14, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 15, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 15, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 16, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 16, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 17, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 17, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 18, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 18, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 19, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 19, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 20, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 20, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 21, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 21, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 22, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 22, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 23, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 23, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 24, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 24, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 25, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 25, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 26, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 26, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 27, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 27, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 28, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 28, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 29, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 29, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 30, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 30, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 31, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256} DeepSpeed Transformer Inference config is {'layer_id': 31, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'encoder_decoder': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': True, 'window_size': 256}

max_len: 2
max_len: 2

[{'generated_text': 'a-'}]##### max_len: 3

max_len: 3
max_len: 4

[{'generated_text': "a's testimony"}]

max_len: 4
max_len: 5

[{'generated_text': 'a. Field of'}]

max_len: 5
max_len: 6

[{'generated_text': 'a, or other\n'}]

max_len: 6
max_len: 7

[{'generated_text': 'a_n)$ and'}]

max_len: 7
max_len: 8

[{'generated_text': 'a, y técnic'}]

max_len: 8
max_len: 9

[{'generatedtext': 'a}^{\pm}{a'}]

max_len: 9
max_len: 10

[{'generated_text': 'a\nl\nc\nu\nl'}]

max_len: 10
max_len: 11

[{'generated_text': 'a\nu\np\nu\np\n'}]

max_len: 11
max_len: 12

[{'generatedtext': 'a{1},1)$ and $b_{'}]

max_len: 12
max_len: 13

[{'generated_text': 'a. I saw a girl in a bar, she was'}]

max_len: 13
max_len: 14

[{'generated_text': 'a>\n '}]

max_len: 14
max_len: 15[{'generated_text': 'a and a couple of others, are coming out with some sort of'}]
max_len: 15
max_len: 16

[{'generated_text': 'a\n,\n \n-\n3\n*\na\n '}]

max_len: 16
max_len: 17

[{'generated_text': 'a, the other members of the group are:\n\nA group formed by'}]

max_len: 17

[{'generated_text': 'a and the other with two more, and one with one more. The average number'}]

max_len: 18
max_len: 18
max_len: 19

[{'generated_text': 'a). While this result is based on the assumption that the electron and positron in the'}]

max_len: 19
max_len: 20

{'generated_text': 'a,b{ref-type="fig"}).\n\nMice'}]

max_len: 20
max_len: 21

[{'generated_text': 'a-z0-9]{0,1}\.\d{1,2}'}]

max_len: 21
max_len: 22

[{'generated_text': 'a\nt\ni\nv\ne\n \no\nf\n \nb\n('}]

max_len: 22
max_len: 23

[{'generated_text': 'a1: "Ia1" "a2: "A2",\n a'}]

max_len: 23
max_len: 24

[{'generated_text': 'a la situacija vladajuće su pozornici. Opozvao je'}]

max_len: 24
max_len: 25

[{'generated_text': 'a,b)=(v_1-v_2){\partial_z}\theta,\quad{\partial'}]

max_len: 25
max_len: 26

[{'generated_text': 'a\n \nm\nu\nl\nt\ni\np\nl\ne\n \no\nf'}]

max_len: 26
max_len: 27

[{'generated_text': "a\nt\n \ni\ns\n \nt\nh\ne\n \nd\n'\nt\n"}]

max_len: 27
max_len: 28

[{'generated_text': 'a, 0x01, 0x0d, 0x43, 0xf1, 0x0e, 0x7'}]

max_len: 28

[{'generated_text': 'a ao longo dos anos, o alargamento da UE conta com uma forte política de'}]

max_len: 29
max_len: 29
max_len: 30

[{'generated_text': 'a(2/17))(-5/14))*(-12/7)/((aa**6/a)/('}]

max_len: 30
max_len: 31

[{'generatedtext': 'a}{\alpha,\gamma}/p^{a}_{\alpha,\gamma}$ are also denoted by ${'}]

max_len: 31
max_len: 32

[{'generated_text': 'a7c5c5e_1\n- :distance: 325\n :file: de5e6615cf645fa9ed6'}]

max_len: 32

[{'generated_text': 'a, who has the same type of relationship to him?"\n\n"Oh, he knows of such affairs of friendship; has written to him about them;'}]

max_len: 33
max_len: 33

[{'generated_text': 'a tome 5, vol. ix. p.\xa01013–1016 (cfr. 2.3.8)\n\n[25] Cf'}]##### max_len: 34

max_len: 34
max_len: 35

[{'generated_text': 'a a mãi mãi, \ntá na minha cabeça \ntão muito legal! \nE tão legal'}]

max_len: 35

[{'generated_text': 'a\'s side." "I believe this is your car, sir?" "I\'m terribly sorry, Miss Karras." "He didn\'t mean to take it." "'}]

max_len: 36
max_len: 36
max_len: 37

[{'generatedtext': 'a and c are also known as s , and k is also known as s.\n\nA common name for t is _'}]

max_len: 37
max_len: 38

[{'generated_text': 'a, Figure 2{ref-type="fig"}. Figure 1.The time course of the experimental protocol. Experimental sessions (n = 5) were performed'}]

max_len: 38

[{'generated_text': 'a) (West 2004); Aufricht, 230 F.3d at 1265-66; United States v. Taveras, 156 F.3d 1234,'}]

max_len: 39
max_len: 39
max_len: 40

[{'generated_text': 'a\nl\nu\ne\n?\n \n \n(\na\n)\n \n2\n1\n/\n1\n2\n8\n \n \n('}]

max_len: 40
max_len: 41

[{'generated_text': 'a2/5]{} (8,-4.5); (0,0); (12,-2.5); (0,2); (2,1.5); (8'}]

max_len: 41
max_len: 42

[{'generated_text': 'a-d]{.smallcaps}-Glycine, respectively. c~1~ in parentheses indicates the mass-balance constant related to chiral symmetry breaking and is a measure'}]

max_len: 42
max_len: 43

[{'generated_text': 'a\nt\n \ni\ns\n \nt\nh\ne\n \nr\ne\nm\na\ni\nn\nd\ne\nr\n \nw\n'}]

max_len: 43
max_len: 44

[{'generated_text': 'a\nl\nl\ne\ns\nt\n \nv\na\nl\nu\ne\n?\n \n \n(\na\n)\n \n-\n4\n7'}]

max_len: 44
max_len: 45

[{'generated_text': 'a)(4)(B)(ii)-(iii); and (e)(3)(B)-(C) of this section applies if the total amount of principal due under any of the terms is due to an employer whose'}]

max_len: 45
max_len: 46

[{'generated_text': 'a}\n=====================================\n\nThe primary data used in this analysis are from the 2007-08 Canadian Tobacco Use Study(CTUS07) administered to Canadian youth aged 12-17 years. The CTUS2007 study'}]

max_len: 46
max_len: 47

[{'generated_text': 'a; @shiraishi_2009; @fukuda_2009; @fukuda_2011; @fukuda_2010; @goto_2009; @goto_2010]. The $^{6'}]

max_len: 47
max_len: 48

[{'generatedtext': 'a-zA-Z]{3}\d|[^A-Za-zA-Z\d])|1[8-\d$]{}",\n "REGEX_'}]

max_len: 48
max_len: 49

[{'generated_text': 'a}(t,t-t_1)U^F\n(t-t_1)~.\n\label{eq:Lanadu2}$$ and Eq.\xa0(\[eq:Lan'}]

max_len: 49

[{'generated_text': 'a, 0xa4, 0x62, 0xc1,\n\t0x4a, 0x8b, 0x5a, 0x9a, 0x7a, 0xc4, 0xf8,'}]

max_len: 50
max_len: 50
max_len: 51

[{'generatedtext': 'a,t}^{a,t}({\mathcal{U}})+\mathcal{K}.$$ Then $$J(\mathbf{\tilde{u}})\leq \int{{\mathbb{R}^d'}]

max_len: 51
max_len: 52

[{'generated_text': 'a\n)\n \n1\n \n \n(\nb\n)\n \nv\n \n \n(\nc\n)\n \n-\n5\n\n\nc\n\n\nL\ne\nt\n \nx'}]

max_len: 52
max_len: 53

[{'generated_text': 'a-5p), which can be directly correlated with the expression levels (R^2^\u2009=\u20090.8023 with p-value\u2009\<\u20090.0001, see Fig.\xa0[2a]('}]

max_len: 53
max_len: 54

[{'generated_text': 'a\times b)}\le\frac{c_b}{C_b}+\frac{1}{(2{\varepsilon})^{1/4}},\end{aligned}$$ where $C_a$ and $C'}]

max_len: 54
max_len: 55

[{'generated_text': 'a7d2e4b2e5\n/Users/me/Projects/x1/x2/x3/x4/x5/x6/x7/x8/x9/x10/x11/x12'}]

max_len: 55
max_len: 56

[{'generated_text': 'a3, 0x0\n#define ixPHY_SPEC_CAP_STATUS1_DELAY_TH_A4 '}]

max_len: 56
max_len: 57

[{'generated_text': 'a.\n-1\nLet q be 3/2 + 12 + -7. Suppose -ql + 28 = 5. Suppose 3u - 23 + l = 0. Solve 2z - 2g - 26 = -6*g,'}]

max_len: 57
max_len: 58

{'generated_text': 'a{ref-type="fig"}. The mean age decreased significantly in both sexes from 24.75--23.75\u2009±\u20096.56 to 24.43--21.00\u2009±\u20095.38 (t�'}]

max_len: 58
max_len: 59

[{'generated_text': "a.\n\n3. I have never been in your dream! Now I get to play the part of a bad guy who gives you advice that you've never told anyone before! Here's a tip – just tell your friends who you are. Then when they find out they will know"}]

max_len: 59
max_len: 60

[{'generated_text': 'a>\n

hyunwoongko commented 3 years ago

I can't find any errors in the log, is the program just deadlocked and stuck?

switiz commented 3 years ago

yes, just deadlocked and not response

switiz commented 3 years ago

Dear Deepspeed

Is this any idea about this issue? Is there a point to add a log ? Is there normal work on your side?

thank you

RezaYazdaniAminabadi commented 3 years ago

Hi @switiz

Sorry for the late reply. I will investigate this and let you know how to solve it. Thanks, Reza

RezaYazdaniAminabadi commented 3 years ago

Okay, now I can repro this on seq-len 140, a bit farther than what you see! I will look more deeper into this and hopefully have a fix soon.

Thanks, Reza

switiz commented 3 years ago

Good news! I will wait for it to be fixed. If it is fixed, I will run the re-test again.

Thanks.

RezaYazdaniAminabadi commented 3 years ago

Hi @switiz

Can you please try to see if this branch solves the issue? By the way, I have changed your script a bit:

import os
import deepspeed
import torch
import transformers
from transformers import pipeline, AutoTokenizer

def init():
    local_rank = int(os.getenv('LOCAL_RANK', '0'))
    world_size = int(os.getenv('WORLD_SIZE', '1'))
    generator = pipeline(
        'text-generation', model='EleutherAI/gpt-neo-2.7B', device=local_rank)
    generator.model = deepspeed.init_inference(generator.model,
                                            mp_size=world_size,
                                            dtype=torch.float,
                                            replace_method='auto')
    return generator

def predict(text, max_len):
    torch.distributed.barrier()
    with torch.no_grad():
        string = generator(text, do_sample=True, 
                            min_length=max_len, 
                            max_length=max_len, 
                            top_k=50, 
                            temperature=1.0, 
                            top_p=1.0, 
                            num_return_sequences=1,
                            pad_token_id=3)
    return string

if __name__ == '__main__':
    generator = init()
    text = 'a'
    seq = 2023
    for i in range(145, seq):
        string = predict(text, i)

        torch.distributed.barrier()
        print(f'[{torch.distributed.get_rank()}] ##### max_len {i} : {string}')

Thanks, Reza

switiz commented 3 years ago

Hi @RezaYazdaniAminabadi

Of course. I will let you know the results after testing.

Thanks

switiz commented 3 years ago

Hi @RezaYazdaniAminabadi

I try to reproduce 10 times ([increase token 145 to 2022] * 10) with your fixed deepspeed repo(0.4.6+5038b07, 5038b07, reyazda/mp_inference)

issue is not reproduce.

There is a slight difference in inference speed when the barrier() added to the code and when not added. But the difference is up to xxx ms in long sequence generation, so it seems to be minor point.

thanks

loadams commented 1 year ago

Closing this issue as @RezaYazdaniAminabadi's branch was merged and seems to have solved the issue.