The transferring of chat messages can result in large amounts of data transferred when the message is streaming. For example, I found that when streaming in a response that was 5800 bytes, it transferred 3.5MB from the Python process to the browser.
(Note that this was counting only the length of the content, and did not include the additional length of the JSON custom message wrapper that is sent to the browser. The wrapper is 187 bytes for each message, and with 1200 individual messages, this results in another 0.2MB.)
This app illustrates. For each chunk, it prints out a line with:
total number of chunks
size of content
cumulative size of all content blocks that have been sent
To reduce the amount of data sent, we could do the following:
Use a text diffing algorithm so that we don't send the entire content over and over as it grows.
Throttle the responses. I think a ~0.05-0.1s delay still results in a responsive-feeling app. If the throttling results in 2 words sent at a time instead of 1, that would reduce traffic by half.
The transferring of chat messages can result in large amounts of data transferred when the message is streaming. For example, I found that when streaming in a response that was 5800 bytes, it transferred 3.5MB from the Python process to the browser.
(Note that this was counting only the length of the
content
, and did not include the additional length of the JSON custom message wrapper that is sent to the browser. The wrapper is 187 bytes for each message, and with 1200 individual messages, this results in another 0.2MB.)This app illustrates. For each chunk, it prints out a line with:
content
content
blocks that have been sentTo reduce the amount of data sent, we could do the following: