Open Pierre-Sassoulas opened 2 weeks ago
We should probably aim to keep the \ud800\udc00
format instead of transforming to actual utf32 characters.
encode('utf8','replace')
seems to work? Although I really don't like that we need to add special logic for a really uncommon case..
Type of Changes
Description
Work in progress for #8736, this is probably not the right fix, but it's a fix. Hoping for a surrogates/unicode expert to chime in with the right approach 😄 !