When using Falcon-40B with 'bloom-accelerate-inference.py' I am getting first the error that
"ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)"
After some changes I got it running so that in the function generate() there is now
"in=DeepSpeed is a machine learning framework
out=DeepSpeed is a machine learning framework"
any idea why it is doing that?
Here is my changed generate function:
def generate():
input_tokens = tokenizer.batch_encode_plus(batch_text_or_text_pairs = inputs,
return_tensors="pt",
padding=False,
return_token_type_ids=False)
for t in input_tokens:
if torch.is_tensor(input_tokens[t]):
input_tokens[t] = input_tokens[t].to("cuda:0")
outputs = model.generate(**input_tokens, **generate_kwargs )
input_tokens_lengths = [x.shape[0] for x in input_tokens.input_ids]
output_tokens_lengths = [x.shape[0] for x in outputs]
total_new_tokens = [o - i for i, o in zip(input_tokens_lengths, output_tokens_lengths)]
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(outputs)
return zip(inputs, outputs, total_new_tokens)
When using Falcon-40B with 'bloom-accelerate-inference.py' I am getting first the error that "ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)"
After some changes I got it running so that in the function generate() there is now
input_tokens = tokenizer.batch_encode_plus(batch_text_or_text_pairs = inputs, return_tensors="pt", padding=False, return_token_type_ids=False)
where previously it was
input_tokens = tokenizer.batch_encode_plus(inputs, return_tensors="pt", padding=True)
But now it generates always the same text:
"in=DeepSpeed is a machine learning framework out=DeepSpeed is a machine learning framework"
any idea why it is doing that?
Here is my changed generate function: def generate(): input_tokens = tokenizer.batch_encode_plus(batch_text_or_text_pairs = inputs, return_tensors="pt", padding=False, return_token_type_ids=False) for t in input_tokens: if torch.is_tensor(input_tokens[t]): input_tokens[t] = input_tokens[t].to("cuda:0")