[TextGeneration] Fix llama tokenizer

Tested code:


import deepsparse

MODEL_ID = "hf:nm-testing/llama2-7B-sparse70-retrained-ultrachat200k-pruned70-smoothquant-ds"
#MODEL_ID = "zoo:mistral-7b-ultrachat200k_mistral_pretrain-pruned40_quantized"

pipe = deepsparse.Pipeline.create(
    task="text-generation",
    model_path=MODEL_ID,
    sequence_length=512,
    prompt_sequence_length=16,
)

message = "Once upon a time"

conversation = []
conversation.append({"role": "user", "content": message})
formatted_conversation = pipe.tokenizer.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)

generation_config = {
    "max_new_tokens": 100,
}

inference = pipe(
    sequences=formatted_conversation,
    generation_config=generation_config,
    streaming=True,
)

for token in inference:
    print(token.generations[0].text, end="")

Output:


There was a time when the world was a different place. A time when people were more accepting of each other and didn't judge based on race, religion, or gender. A time when kindness and compassion were the norm, and hate and prejudice were unheard of.

But then something changed. The world became more divided, and people started to see each other through a

neuralmagic / deepsparse

[TextGeneration] Fix llama tokenizer #1635