Closed riyad closed 5 years ago
Maybe this (PHP) regexp can be adapted. Or you might have a look at Markus Kuhn’s UTF-8 decoder stress test.
I suspect this may cause quite a performance hit, but now that I've introduced settings, I guess this could be added an opt-in feature (off by default). Do you think that would make sense?
This would be a useful feature even if it was behind a flag. :slightly_smiling_face:
Well now that I switched to using a whitelist, I think this may be solved. Can you verify?
yes, it highlights all weird cases from Markus Kuhn’s UTF-8 decoder stress test.
Sadly now I have a similar but unrelated problem. :cry: Atom seems to render valid, but "undisplayable" characters (i.e. unprintable, control, missing font) same as invalid sequences: i.e. with �. :sob:
weird, but valid sequences:
invalid sequences:
But I'll try to figure this out separately.
Thank you. :slightly_smiling_face:
There're invalid/incomplete character UTF-8 sequences that editors handle "gracefully" but cause errors when running a script. Maybe you could also highlight those byte sequences also.