Closed saschanaz closed 2 years ago
This is intentional and documented.
The thinking behind this is that the decoder receives data from the wild, so stuff getting split across I/O buffers is normal and errors in the data are not programming errors in the application using the library.
In contrast, the encoder receives application-internal Unicode representations. In this case, the caller is expected to keep each of its internal buffers valid on a per-buffer basis. This is conceptually similar to the case of receiving application-internal UTF-8 and encoding it into output. However, in that case, UTF-8 validity is enforced on the type system level.
This design thinking doesn't quite fit the case where the encoder receives data from a JavaScript engine where the JavaScript program comes from the wild, and the encoder input is a sequence of DOMString
s as opposed to a sequence of USVString
s.
Sorry about this design decision not being a great fit for your use case. However, I'm reluctant to change the encoding_rs
-level design here.
Per this sample code it seems only the decoder stores the surrogate while the encoder does not. This is counterintuitive to me, is this done intentionally, or is there a way to do the equivalent, or a bug?