Client get messages encoded in ISO_8859_1

dlemaignent commented 4 years ago

Is it possible to make enconding configurable with "text/event-stream, charset=UTF-8" on createSseEmitter I'm doing something like that to get accents in javascript client side :

byte[] bytes = jsonMessageString.getBytes(StandardCharsets.UTF_8); String utf8EncodedString = new String(bytes, StandardCharsets.ISO_8859_1);

Thanks

ralscha commented 4 years ago

According to the SSE specification event streams are always UTF-8 decoded and encoding can't be changed.

9.2.1 Server-sent events: Introduction "Event streams are always decoded as UTF-8. There is no way to specify another character encoding."

dlemaignent commented 4 years ago

Thank you for your answer. (i've made a mistake in my question, utf8EncodedString should be isoEncodedString...). I understand that EventSource (client side) always decode as UTF-8. But in don't understand why I need to convert my datas (to send in the event) as ISO when I build the event. Here an example:

applicationEventPublisher.publishEvent(SseEvent.builder().event(channel).data(jsonMessageString).build()); If I write my message objet as json string (utf-8) and send it in the event, the client side don't decode french accented characters.

byte[] bytes = jsonMessageString.getBytes(StandardCharsets.UTF_8);
String isoEncodedString = new String(bytes, StandardCharsets.ISO_8859_1);
applicationEventPublisher.publishEvent(SseEvent.builder().event(channel).data(isoEncodedString).build());

Now if I convert my json string to ISO_8859_1 like this example, accents are working..

Thanks

ralscha commented 4 years ago

No idea what's going on. In Java all Strings are UTF-8. In your example jsonMessageStringis UTF-8 decoded and so is isoEncodedString.

If we look at the bytes then we see that isoEncodedStringcontains a completely wrong content. I assume parsing a string with ISO_8859_1 takes each byte individually and encodes it as UTF-8 and you get the double length.

    byte[] bytes = "èàé".getBytes(StandardCharsets.UTF_8);
    for (int i = 0; i < bytes.length; i++) {
      System.out.print(String.format("%x", bytes[i]));
      System.out.print(" ");
    }
   //Output:  c3 a8 c3 a0 c3 a9

    String isoEncodedString = new String(bytes, StandardCharsets.ISO_8859_1);
    byte[] isoEncodedBytes = isoEncodedString.getBytes();
    for (int i = 0; i < isoEncodedBytes.length; i++) {
      System.out.print(String.format("%x", isoEncodedBytes[i]));
      System.out.print(" ");
    }
    //Output: c3 83 c2 a8 c3 83 c2 a0 c3 83 c2 a9 

    System.out.println(isoEncodedString);
    //Output: Ã¨Ã Ã©
    System.out.println(new String(isoEncodedBytes));
    //Output: Ã¨Ã Ã©

Can you check what's going over the wire? I assume there is a bug in my library.

ralscha / sse-eventbus

Client get messages encoded in ISO_8859_1 #16