Implement streaming responses

simonw commented 2 months ago

Split from:

1

simonw commented 2 months ago

The Reka Python library doesn't support this... but from sniffing traffic on https://chat.reka.ia/ I reverse-engineered a method that works:

export REKA=$(cat "$(llm keys path)" | jq '."reka"' -r)

Then:

curl 'https://api.reka.ai/chat' -X POST -H 'Content-Type: application/json' -H "X-Api-Key: $REKA" --data-raw '{
    "conversation_history": [
        {
            "type": "human",
            "text": "Say hello in spanish in three different ways"
        }
    ],
    "use_search_engine": false,
    "use_code_interpreter": false,
    "model_name": "reka-core",
    "stream": true,
    "random_seed": 1713403859366
}'

Which outputs an events stream like this:

event: message
data: {"type":"model","text":" Here","metadata":{"input_tokens":15,"generated_tokens":1}}

event: message
data: {"type":"model","text":" Here are","metadata":{"input_tokens":15,"generated_tokens":2}}

...

event: message
data: {"type":"model","text":" Here are three different ways to say \"hello\" in Spanish:\n\n1. Hola: This is the most common and simple way to say hello in Spanish. It's used in both formal and informal situations.\n2. Buenos días: This means \"good day\" and is used to greet someone in the morning or early afternoon, until around 2 or 3 pm.\n3. Buenas tardes: This means \"good afternoon\" and is used to greet someone from around 2 or 3 pm until sunset. After sunset, you would use \"buenas noches\" (good evening/night).\n\n","finish_reason":"stop","metadata":{"input_tokens":15,"generated_tokens":127}}

Weird how it outputs the full content on every single line.

That last line finished like this:

"finish_reason":"stop","metadata":{"input_tokens":15,"generated_tokens":127}

simonw commented 2 months ago

I can probably use https://pypi.org/project/httpx-sse/ to handle this.

simonw commented 2 months ago

Almost got this working...

import llm
import httpx
from httpx_sse import connect_sse
import json

MODELS = ("reka-core", "reka-edge", "reka-flash")

@llm.hookimpl
def register_models(register):
    for model_id in MODELS:
        register(
            Reka(model_id),
        )

class Reka(llm.Model):
    needs_key = "reka"
    can_stream = True

    def __init__(self, model_id):
        self.model_id = model_id

    def execute(self, prompt, stream, response, conversation):
        with httpx.Client() as client:
            with connect_sse(
                client,
                "POST",
                "https://api.reka.ai/chat",
                headers={
                    "x-api-key": llm.get_key("", "reka", "LLM_REKA_KEY"),
                },
                json={
                    "conversation_history": [
                        {
                            "type": "human",
                            "text": prompt.prompt,
                        }
                    ],
                    "use_search_engine": False,
                    "use_code_interpreter": False,
                    "model_name": self.model_id,
                    "stream": True,
                    # "random_seed": 1713403859366
                },
            ) as event_source:
                last_text = ""
                accumulated = []
                for sse in event_source.iter_sse():
                    accumulated.append(sse.json())
                    info = json.loads(sse.data)
                    text = info["text"]
                    if text != last_text:
                        # Figure out what's new and yield that
                        new_text = text[len(last_text) :]
                        yield new_text
                        last_text = text
                print()
                print(json.dumps(accumulated, indent=2))

simonw commented 2 months ago

Hit a show-stopper bug:

llm -m reka-flash 'Where is France?'

France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.

<sep

What's with that weird <sep thing at the end?

Turns out the SSE messages sent ended with these:

"text": " France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.\n\n <"

"text": " France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.\n\n <sep"

"text": " France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.\n\n"

Note how that <sep> token is returned incomplete, then dropped from the subsequent message.

simonw commented 2 months ago

Since I don't have a mechanism by which LLM can un-send tokens it has already sent, I'm not sure of a way to work around this.

simonw commented 2 months ago

One option here would be to delay yielding tokens for a few steps, that way I could try and spot when something like this happens and avoid emitting a token that I later regret.

simonw / llm-reka

Implement streaming responses #2

1