ohanhi / atom-highlight-bad-chars

Atom plugin: Highlight bad Unicode characters that can cause hard to spot syntax errors.
MIT License
19 stars 12 forks source link

Highlight bad character sequences #4

Closed riyad closed 5 years ago

riyad commented 8 years ago

There're invalid/incomplete character UTF-8 sequences that editors handle "gracefully" but cause errors when running a script. Maybe you could also highlight those byte sequences also.

riyad commented 8 years ago

Maybe this (PHP) regexp can be adapted. Or you might have a look at Markus Kuhn’s UTF-8 decoder stress test.

ohanhi commented 5 years ago

I suspect this may cause quite a performance hit, but now that I've introduced settings, I guess this could be added an opt-in feature (off by default). Do you think that would make sense?

riyad commented 5 years ago

This would be a useful feature even if it was behind a flag. :slightly_smiling_face:

ohanhi commented 5 years ago

Well now that I switched to using a whitelist, I think this may be solved. Can you verify?

riyad commented 5 years ago

yes, it highlights all weird cases from Markus Kuhn’s UTF-8 decoder stress test.

Sadly now I have a similar but unrelated problem. :cry: Atom seems to render valid, but "undisplayable" characters (i.e. unprintable, control, missing font) same as invalid sequences: i.e. with �. :sob:

weird, but valid sequences: valid sequences

invalid sequences: invalid sequences

But I'll try to figure this out separately.

Thank you. :slightly_smiling_face: