posit-dev / py-shiny

Shiny for Python
https://shiny.posit.co/py/
MIT License
1.32k stars 82 forks source link

Streaming chat messages can result in large data transfer #1633

Open wch opened 3 months ago

wch commented 3 months ago

The transferring of chat messages can result in large amounts of data transferred when the message is streaming. For example, I found that when streaming in a response that was 5800 bytes, it transferred 3.5MB from the Python process to the browser.

(Note that this was counting only the length of the content, and did not include the additional length of the JSON custom message wrapper that is sent to the browser. The wrapper is 187 bytes for each message, and with 1200 individual messages, this results in another 0.2MB.)

This app illustrates. For each chunk, it prints out a line with:

import ollama

from shiny.express import ui

chat = ui.Chat(id="chat")
chat.ui()

total_count = 0
total_length = 0

@chat.on_user_submit
async def _():
    total_count = 0
    total_length = 0

    messages = chat.messages(format="ollama")
    # Assumes you've run `ollama run llama3.1` to start the server
    response = ollama.chat(
        model="llama3.1",
        messages=messages,
        stream=True,
    )
    await chat.append_message_stream(response)

@chat.transform_assistant_response()
async def transform_response(content: str, chunk: str, done: bool) -> str:
    global total_count, total_length
    total_count += 1
    total_length += len(content)
    print(f"{total_count} : {len(content)} : {total_length}")
    return content

To reduce the amount of data sent, we could do the following:

cpsievert commented 3 months ago

Note to self: Python comes with a difflib module, and ndiff() might be what we need for this https://docs.python.org/3/library/difflib.html#difflib.ndiff