[Fix] Appropriate whitespace missing in streaming output for Llama2, Mistral models

Fix for: https://app.asana.com/0/1205229323407165/1205993428418769/f

Testing

from deepsparse import Pipeline
model_path = "zoo:opt-1.3b-opt_pretrain-pruned50_quantW8A8"
model_path = "zoo:llama2-7b-gsm8k_llama2_pretrain-pruned80_quantized"
pipeline = Pipeline.create(task="text-generation", model_path=model_path, sequence_length=64)
generations = pipeline(prompt="Hi, my name is Slim", streaming=True)
print(("").join([g.generations[0].text for g in generations]))

Before:

.everyoneisamemberofthesamegroup,andthereare10membersinthegroup.
Ifyouareamemberofthegroup,andthereare10membersinthegroup,thenthegroupisdividedinto

Now:

. everyone is a member of the same group, and there are 10 members in the group.
If you are a member of the group, and there are 10 members in the group, then the group is divided into

Making sure that streaming for multiple prompts at once also works

from deepsparse import Pipeline
prompts=["Hi, my name is Slim Shady", "Napoleon was"]
model_path = "zoo:llama2-7b-gsm8k_llama2_pretrain-pruned80_quantized"
pipeline = Pipeline.create(task="text-generation", model_path=model_path, sequence_length=64)

generations_first_prompt_only = list(pipeline(prompt=prompts[0], streaming=True))
generations_second_prompt_only = list(pipeline(prompt=prompts[1], streaming=True))
text_generated_first_prompt_only = "".join([g.generations[0].text for g in generations_first_prompt_only])
text_generated_second_prompt_only = "".join([g.generations[0].text for g in generations_second_prompt_only])
print(f"Text one: {text_generated_first_prompt_only}")
print(f"Text two: {text_generated_second_prompt_only}")

bag_of_words_first_prompt_only = [g.generations[0].text for g in generations_first_prompt_only]
bag_of_words_second_prompt_only = [g.generations[0].text for g in generations_second_prompt_only]

generations = pipeline(prompt=prompts, streaming=True)
bag_of_words = []
for r in generations:
    for gen in r.generations:
        text = gen.text
        bag_of_words.append(text)

assert sorted(bag_of_words_first_prompt_only+bag_of_words_second_prompt_only) == sorted(bag_of_words)

neuralmagic / deepsparse

[Fix] Appropriate whitespace missing in streaming output for Llama2, Mistral models #1431

Testing

Before:

Now:

Making sure that streaming for multiple prompts at once also works