For reference, ruff also refuses to parse/format non-utf8 files.
tree-sitter used to effectively requires UTF-8 or UTF-16, but as of extremely recently it gained support for custom encodings, but I doubt we really want to get in the game of doing that. https://github.com/tree-sitter/tree-sitter/pull/3833
I imagine if we did anything it would be:
Read in as OSString with some locale
Convert to UTF-8 as soon as possible
Parse/Format in UTF-8
Convert back to original locale
But that sounds tricky to get right
I imagine this is:
A non issue for Mac and Linux
A super minor issue for Windows, where 99.9% of the time users have UTF-8 files, but 0.1% of the time they've copied in some Latin1 characters into their file from some other system, or from R output. This likely improved on R >=4.2 though, since UTF-8 is now the default on Windows.
For example, https://github.com/wch/r-source/blob/trunk/tests/utf8-regex.R is a test R file in base R that directly contains Latin1 characters. We currently fail to read in this file.
For reference, ruff also refuses to parse/format non-utf8 files.
tree-sitter used to effectively requires UTF-8 or UTF-16, but as of extremely recently it gained support for custom encodings, but I doubt we really want to get in the game of doing that. https://github.com/tree-sitter/tree-sitter/pull/3833
I imagine if we did anything it would be:
But that sounds tricky to get right
I imagine this is: