neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

[Text Generation][KVCacheStorage] `ChatPipeline` implementation #1266

Closed dbogunowicz closed 1 year ago

dbogunowicz commented 1 year ago

Implementation of ChatPipeline that builds on top of TextGenerationPipeline, but uses session_ids and StorageKVCache to enable continuous recollection of past kv_cache information.

Example use:

from deepsparse import Pipeline

pipeline = Pipeline.create(
    task="chat",
    model_path="/home/ubuntu/damian/sparseml/deployment_opt",
)
session_id = "session_id"
while True:
    # get input from user
    input_text = input("User: ")
    response = pipeline(sequences=[input_text], session_ids=session_id, max_tokens=32)
    print("Bot: ", response.generations[0].text)
User: Hi my name is Damian and I am from Italy.
Bot:   I am a professional photographer and I have been working in the industry for over 10 years. I have been working in the industry for the last 5 years and I
User: am from   
Bot:   Italy. I have been working in the industry for the last 5 years and I am from  Italy. I have been working in the industry for the last 5
User: Did I mention that my name is
Bot:   Damian?

I am a professional photographer and I have been working in the industry for the last 5 years and 

Testing:

bfineran commented 1 year ago

error looks unrelated right now - merging