Open 78 opened 1 year ago
I also encountered the same problem, and I have tried many methods but found no effective solution.
@xiangyuliu I ran into a similar issue but found a workaround here that solved the issue for me. The workaround is to set replace_with_kernel_inject=False
, which makes inference slower but means the outputs come out valid. Maybe this helps.
ETA: Note that I used the latest stable deepspeed (0.9.5). I believe a fix was merged sometime after 0.9.2 to allow running LLaMA using AutoTP, and I don't know if that fix made it into 0.9.3.
I reported the same issuse: https://github.com/microsoft/DeepSpeed/issues/3932;
But replace_with_kernel_inject=False
is much slower.
Describe the bug Deepspeed (0.9.3) inference works fine with a single GPU (Tesla A30 24G), but gives invalid output with multiple GPUs (by setting --num_gpus 2). Test model: OpenBuddy 7B (LLaMA based)
To Reproduce
deepspeed --num_gpus 2 test.py
. I got the output:deepspeed --num_gpus 1 test.py
. It worked as expected.local_rank = int(os.getenv('LOCAL_RANK', '0')) world_size = int(os.getenv('WORLD_SIZE', '1')) generator = pipeline('text-generation', model='/data/openbuddy-7b-v1.4', torch_dtype=torch.bfloat16, device=local_rank)
generator.model = deepspeed.init_inference(generator.model, mp_size=world_size, injection_policy={LlamaDecoderLayer: ('self_attn.o_proj', 'mlp.down_proj')})
string = generator("DeepSpeed is", max_new_tokens=20) if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0: print(string)
[{'generated_text': 'DeepSpeed is the key to helping you achieve your goal as it is a great way to get your vision right.'}]
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-devel package with yum [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] cpu_adagrad ............ [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/usr/local/lib64/python3.9/site-packages/torch'] torch version .................... 2.0.1+cu118 deepspeed install path ........... ['/usr/local/lib/python3.9/site-packages/deepspeed'] deepspeed info ................... 0.9.3, unknown, unknown torch cuda version ............... 11.8 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.0, cuda 11.8