janestreet / sexplib

Automated S-expression conversion
MIT License
147 stars 27 forks source link

UTF-8 Safe Mode #18

Closed dsheets closed 8 years ago

dsheets commented 8 years ago

Right now, sexplib uses String.escaped for serializing strings. If those strings contain high bytes that are not part of UTF-8 encoded sequences, they will be output as-is. This results in behavior like:

# Format.printf "%a@." Sexp.pp_mach (Sexp.of_string "(String\"\247\")");;
(String �)

When sexplib is used for logging and debugging, this can cause issues when UTF-8 valid text is expected. Perhaps the function used to escape strings could be parameterized? It would be really nice to efficiently (not generating and then iterating over the buffer checking for non-UTF-8 bytes and copying into another buffer) output UTF-8-safe strings.

ghost commented 8 years ago

Escaping all non-ascii characters seems like a good default. I submitted a change internally, it should be ready for the next release

dsheets commented 8 years ago

Great! Thanks!