Closed Jiyeon1230 closed 2 years ago
Deepspeed Inference is not a completed product AFAIK, and it's not yet integrated into Transformers because of that.
As you can see from the trace the transformers
library is not being used. So please re-file this issue with Deepspeed and tag @RezaYazdaniAminabadi.
Any reason why you are not using Deepspeed ZeRO Inference? https://huggingface.co/transformers/master/main_classes/deepspeed.html#deepspeed-zero-inference
Deepspeed Inference and Deepspeed ZeRO Inference are 2 completely different things. The former uses Tensor parallelism - the 2nd is ZeRO sharding.
@stas00 Thanks for your comment. I filed the case to transformers because the issue is not reproduced with low_cpu_mem_usage disabled. Do you think it needs to be handled by Deepspeed even so?
I'm eventually trying to load a bigger GPT-Neo like model which doesn't fit to one GPU. That's why Deepspeed Inference is used. Appreciate your advise though.
@stas00 I have no idea which diff fixes the issue, but it fixed after I update Deepspeed Inference code to the latest. I really appreciate your advise since I did only focus on transformers. Thanks!!!
It's an actively developed new product, so someone must have reported this issue recently and it got fixed.
I'm glad this is now working for you, @Jiyeon1230
I'm eventually trying to load a bigger GPT-Neo like model which doesn't fit to one GPU. That's why Deepspeed Inference is used.
And I repeat again that Deepspeed ZeRO is already well tested scalability solution that you can use today to use models larger than one GPU and it's fully integrated into Transformers. It has additional features like CPU Offload, which scales better, and which I don't think Deepspeed Inference supports at the moment. See the doc link in my last comment.
But, of course, it's up to you what you use.
@stas00 Oh, sure! I may have misunderstood about Deepspeed ZeRO. I'll definitely look into it. Thanks for your advise.
Environment info
transformers
version: 4.12.5Who can help
@stas00 ## Information Model I am using (Bert, XLNet ...): EleutherAI/gpt-neo-1.3B The problem arises when using: * [x] the official example scripts: (give details below) * [ ] my own modified scripts: (give details below) The tasks I am working on is: * [ ] an official GLUE/SQUaD task: (give the name) * [x] my own task or dataset: (give details below) ## To reproduce Steps to reproduce the behavior: 1. write the following code which is referred from https://www.deepspeed.ai/tutorials/inference-tutorial/#end-to-end-gpt-neo-27b-inference ```ruby import os import deepspeed import torch from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer local_rank = int(os.getenv('LOCAL_RANK', '0')) world_size = int(os.getenv('WORLD_SIZE', '1')) model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B", low_cpu_mem_usage=True) #model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", low_cpu_mem_usage=True) tokenizer_i = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B") #tokenizer_i = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") generator = pipeline('text-generation', model=model, device=local_rank, tokenizer=tokenizer_i) generator.model = deepspeed.init_inference(generator.model, mp_size=world_size, dtype=torch.float, replace_method='auto') string = generator("DeepSpeed is", do_sample=True, min_length=50) if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0: print(string) ``` 2. execute the code using Deepspeed as the following command ```shell deepspeed --num_gpus 1 test.py ``` 3. failed to execute ```log Traceback (most recent call last): File "test.py", line 25, inExpected behavior
Run deepspeed inference successfully without any failure
Comment
Hi all,
I'm trying to run GPT-Neo inference using Deepspeed. Because of my system environment, I need to reduce the peak RAM usage, so added the argument, low_cpu_mem_usage as True to from_pretrained. But it gets failed as I described. I'm filing this case to HF because removing low_cpu_mem_usage or changing model to gpt-j-6B, it succeed to run. Could you advise for this problem? If low_cpu_mem_usage feature doesn't support GPT-Neo, it would be appreciated if you say so.
Thanks,