jooby-project / jooby

The modular web framework for Java and Kotlin
https://jooby.io
Apache License 2.0
1.7k stars 199 forks source link

SSE: Spec Interpretation Issue (preserve leading spaces) #3479

Closed sashirestela closed 1 month ago

sashirestela commented 1 month ago

Hi @jknack. First of all, very thanks for your fantastic web framework!

I've started to use this framework as the backend for an AI project using RAG and LLM and I'm using the SSE functionality. On the other hand, I'm using the sse.js library to handle SSE for the frontend.

What I see here is a different interpretation for the SSE Spec which is impacting in my project.

In the following SSE Spec Html Spec - Server sent events - Interpreting an event stream you can read:

image

So, sse.js and similar libraries are expecting a space after the colon and they are discarding it. Even LLMs as OpenAI are producing SSE with that extra space after the colon.

However, Jooby is processing SSE without that extra space, so, if for example streaming data arriving from the LLM starts with spaces, they are removed by sse.js, due to its SSE Spec interpretation, and we have:

Jooby sse.js
'one' 'one'
' juicy ' 'juicy '
'red' 'red'
' apple.' 'apple.'

I'm concatenating the data chunks to be shown in the UI, so the expected value is:

one juicy red apple.

but, due to the sse.js processing, currently I'm getting:

onejuicy redapple.

Can you see the issue here? Could you elaborate a change here, please?

mpetazzoni commented 1 month ago

Hi 👋🏻 https://github.com/mpetazzoni/sse.js author here.

I wanted to bring a point of clarification that sse.js does not expect a space after the colon; it supports both, as required by the SSE specification. When a space is present, it is removed from the value. See https://github.com/mpetazzoni/sse.js/blob/main/lib/sse.js#L218-L222 for the corresponding parsing logic.

For the Jooby authors: I believe you need to handle values (or new lines of the value, if it's broken into multiple data: lines) that start with a leading space by writing it out with two spaces. To write out the following text:

This is my event.
 This second line starts with a leading space.
But not this third one.

You must output:

data:This is my event.
data:  This second line starts with a leading space.
data:But not this third one.

Optionally, this would result in the same value being seen by the client:

data: This is my event.
data:  This second line starts with a leading space.
data: But not this third one.

And that's why it's often always easier to just output SSE fields with a space after the colon. Then you don't need extra logic.

✌🏻