Closed irshadbhat closed 1 year ago
What do you mean with the "out" is really random?
Ohh sorry, I mean output of the model was random, Not what was expected at all.
I used the below code for inference:
import sys
import time
import torch
import argparse
import numpy as np
import pandas as pd
import deepspeed
import evaluate
import datasets
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DataCollatorForSeq2Seq
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
def parse_arge():
"""Parse the arguments."""
parser = argparse.ArgumentParser()
# add model id and dataset path argument
parser.add_argument("--model_id", type=str, default="/mnt/flan-t5-xxl", help="Model id to use for training.")
parser.add_argument("--per_device_train_batch_size", type=int, default=8, help="Batch size to use for training.")
parser.add_argument("--per_device_eval_batch_size", type=int, default=8, help="Batch size to use for testing.")
parser.add_argument("--generation_max_length", type=int, default=140, help="Maximum length to use for generation")
parser.add_argument("--generation_num_beams", type=int, default=1, help="Number of beams to use for generation.")
parser.add_argument("--deepspeed", type=str, default=None, help="Path to deepspeed config file.")
args = parser.parse_known_args()
return args
def inference_function(args):
tokenizer = AutoTokenizer.from_pretrained(args.model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(args.model_id)
while True:
text = input('Input text:\n')
t1 = time.time()
batch = tokenizer.prepare_seq2seq_batch(src_texts=[text], max_length=256, truncation=True, return_tensors="pt")
output = model.generate(batch["input_ids"], max_length=128, min_length=2, early_stopping=True, num_beams=1)#, temperature=0.8, top_p=0.75)#, top_k=10, num_beams=5)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in output])
print(time.time()-t1)
def main():
args, _ = parse_arge()
inference_function(args)
if __name__ == "__main__":
main()
whats your model id and output?
--model_id /mnt/flan_modelling/flan-t5-xxl-ner-ft/checkpoint-26/
output: ['oa*lapcte ole gbo llonm l.it']
I looked into deepspeed/inference from deepspeed.ai and found the end to end inference code for GPT NEO.
I updated the code to work for flan-t5-xxl as below:
import os
import deepspeed
import torch
from transformers import pipeline
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '4'))
generator = pipeline('text2text-generation', model='/mnt/flan_modelling/flan-t5-xxl-ner-ft/checkpoint-26/',
device=local_rank)
generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.bfloat16,
replace_with_kernel_inject=True)
string = generator("DeepSpeed is", do_sample=True, min_length=2)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
I have used the ds_flan_t5_z3_offload_bf16.json
config file for training. So I guess I have to use dtype=torch.bfloat16.
But now I am getting CUDA OOM.
Please suggest any changes I need to do so I can use the trained model for inference.
I have raised a different issue #12 with better detail. Please feel free to delete this issue.
Hi there,
I trained a flan-t5-xxl model following the steps from your blog. The training went well without any issues.
I ran the model for inference with an inference script for a normal huggingface seq2seq model. Just called the script with deepspeed as:
The out of the model is very random. I believe I am doing something wrong, maybe I need to pass the config file for inference as well.
Could you please provide a sample inference script. My appologies if this is quite trivial, I am fairly new to deepspeed.
Looking forward for your responce.
Best, Irshad