huggingface / transformers-bloom-inference

Fast Inference Solutions for BLOOM
Apache License 2.0
560 stars 114 forks source link

It does not work with Falcon-40B correctly #100

Open AGrosserHH opened 1 year ago

AGrosserHH commented 1 year ago

When using Falcon-40B with 'bloom-accelerate-inference.py' I am getting first the error that "ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)"

After some changes I got it running so that in the function generate() there is now

input_tokens = tokenizer.batch_encode_plus(batch_text_or_text_pairs = inputs, return_tensors="pt", padding=False, return_token_type_ids=False)

where previously it was

input_tokens = tokenizer.batch_encode_plus(inputs, return_tensors="pt", padding=True)

But now it generates always the same text:

"in=DeepSpeed is a machine learning framework out=DeepSpeed is a machine learning framework"

any idea why it is doing that?

Here is my changed generate function: def generate(): input_tokens = tokenizer.batch_encode_plus(batch_text_or_text_pairs = inputs, return_tensors="pt", padding=False, return_token_type_ids=False) for t in input_tokens: if torch.is_tensor(input_tokens[t]): input_tokens[t] = input_tokens[t].to("cuda:0")

outputs = model.generate(**input_tokens, **generate_kwargs )

input_tokens_lengths = [x.shape[0] for x in input_tokens.input_ids]
output_tokens_lengths = [x.shape[0] for x in outputs]

total_new_tokens = [o - i for i, o in zip(input_tokens_lengths, output_tokens_lengths)]
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(outputs)

return zip(inputs, outputs, total_new_tokens)