Two arguments specify the number of generated tokens: max_lenght (includes prompt length) and max_new_tokens (does not include prompt length); as specified in GenerationDefaults in src.deepsparse.v2.text_generation.process_inputs.py:
However, the logic of the pipeline was such, that it always used max_length argument and ignored max_new_tokens argument. This diff assumes that if max_new_tokens != None, it would take precedence over max_length.
This is demonstrated by this code snippet:
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline
model_path = "hf:mgoin/TinyStories-1M-deepsparse"
max_new_tokens = 64
pipeline = TextGenerationPipeline(model_path,generation_config=dict(max_new_tokens=max_new_tokens), force_max_tokens=True)
out = pipeline(prompt=["Get more cheese than doritos, cheetos, or fritos"])
print(f"Number of prompt tokens: {len(pipeline.tokenizer.tokenize(out.prompts[0]))}")
print(f"Number of generated tokens: {len(pipeline.tokenizer.tokenize(out.generations[0].text))} versus {max_new_tokens}")
Before:
Number of prompt tokens: 17
Number of generated tokens: 19 versus 64
Now:
Number of prompt tokens: 17
Number of generated tokens: 64 versus 64
Testing
All tests in tests.deepsparse.v2.unit and tests.deepsparse.transformers pass.
Feature Description
Two arguments specify the number of generated tokens:
max_lenght
(includes prompt length) andmax_new_tokens
(does not include prompt length); as specified inGenerationDefaults
insrc.deepsparse.v2.text_generation.process_inputs.py
:However, the logic of the pipeline was such, that it always used
max_length
argument and ignoredmax_new_tokens
argument. This diff assumes that ifmax_new_tokens
!= None, it would take precedence overmax_length
.This is demonstrated by this code snippet:
Before:
Now:
Testing
All tests in
tests.deepsparse.v2.unit
andtests.deepsparse.transformers
pass.