Open simonw opened 2 months ago
The Reka Python library doesn't support this... but from sniffing traffic on https://chat.reka.ia/ I reverse-engineered a method that works:
export REKA=$(cat "$(llm keys path)" | jq '."reka"' -r)
Then:
curl 'https://api.reka.ai/chat' -X POST -H 'Content-Type: application/json' -H "X-Api-Key: $REKA" --data-raw '{
"conversation_history": [
{
"type": "human",
"text": "Say hello in spanish in three different ways"
}
],
"use_search_engine": false,
"use_code_interpreter": false,
"model_name": "reka-core",
"stream": true,
"random_seed": 1713403859366
}'
Which outputs an events stream like this:
event: message
data: {"type":"model","text":" Here","metadata":{"input_tokens":15,"generated_tokens":1}}
event: message
data: {"type":"model","text":" Here are","metadata":{"input_tokens":15,"generated_tokens":2}}
...
event: message
data: {"type":"model","text":" Here are three different ways to say \"hello\" in Spanish:\n\n1. Hola: This is the most common and simple way to say hello in Spanish. It's used in both formal and informal situations.\n2. Buenos días: This means \"good day\" and is used to greet someone in the morning or early afternoon, until around 2 or 3 pm.\n3. Buenas tardes: This means \"good afternoon\" and is used to greet someone from around 2 or 3 pm until sunset. After sunset, you would use \"buenas noches\" (good evening/night).\n\n","finish_reason":"stop","metadata":{"input_tokens":15,"generated_tokens":127}}
Weird how it outputs the full content on every single line.
That last line finished like this:
"finish_reason":"stop","metadata":{"input_tokens":15,"generated_tokens":127}
I can probably use https://pypi.org/project/httpx-sse/ to handle this.
Almost got this working...
import llm
import httpx
from httpx_sse import connect_sse
import json
MODELS = ("reka-core", "reka-edge", "reka-flash")
@llm.hookimpl
def register_models(register):
for model_id in MODELS:
register(
Reka(model_id),
)
class Reka(llm.Model):
needs_key = "reka"
can_stream = True
def __init__(self, model_id):
self.model_id = model_id
def execute(self, prompt, stream, response, conversation):
with httpx.Client() as client:
with connect_sse(
client,
"POST",
"https://api.reka.ai/chat",
headers={
"x-api-key": llm.get_key("", "reka", "LLM_REKA_KEY"),
},
json={
"conversation_history": [
{
"type": "human",
"text": prompt.prompt,
}
],
"use_search_engine": False,
"use_code_interpreter": False,
"model_name": self.model_id,
"stream": True,
# "random_seed": 1713403859366
},
) as event_source:
last_text = ""
accumulated = []
for sse in event_source.iter_sse():
accumulated.append(sse.json())
info = json.loads(sse.data)
text = info["text"]
if text != last_text:
# Figure out what's new and yield that
new_text = text[len(last_text) :]
yield new_text
last_text = text
print()
print(json.dumps(accumulated, indent=2))
Hit a show-stopper bug:
llm -m reka-flash 'Where is France?'
France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.
<sep
What's with that weird <sep
thing at the end?
Turns out the SSE messages sent ended with these:
"text": " France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.\n\n <"
"text": " France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.\n\n <sep"
"text": " France is located in Western Europe. It is bordered by Belgium and Luxembourg to the north, Germany and Switzerland to the east, Italy and Monaco to the southeast, Spain and Andorra to the southwest, and the English Channel and the Atlantic Ocean to the north and west. The country covers an area of approximately 643,801 square kilometers (248,573 square miles) and is governed as a unitary semi-presidential republic. Its capital is Paris, and it is one of the most visited countries in the world for its culture, history, landmarks, and cuisine.\n\n"
Note how that <sep>
token is returned incomplete, then dropped from the subsequent message.
Since I don't have a mechanism by which LLM can un-send tokens it has already sent, I'm not sure of a way to work around this.
One option here would be to delay yielding tokens for a few steps, that way I could try and spot when something like this happens and avoid emitting a token that I later regret.
Split from:
1