posit-dev / air

21 stars 0 forks source link

Decide if we want to support `.R` files containing non-UTF-8 characters #60

Open DavisVaughan opened 2 days ago

DavisVaughan commented 2 days ago

For example, https://github.com/wch/r-source/blob/trunk/tests/utf8-regex.R is a test R file in base R that directly contains Latin1 characters. We currently fail to read in this file.

For reference, ruff also refuses to parse/format non-utf8 files.

tree-sitter used to effectively requires UTF-8 or UTF-16, but as of extremely recently it gained support for custom encodings, but I doubt we really want to get in the game of doing that. https://github.com/tree-sitter/tree-sitter/pull/3833

I imagine if we did anything it would be:

But that sounds tricky to get right


I imagine this is: