Decide if we want to support `.R` files containing non-UTF-8 characters

For example, https://github.com/wch/r-source/blob/trunk/tests/utf8-regex.R is a test R file in base R that directly contains Latin1 characters. We currently fail to read in this file.

For reference, ruff also refuses to parse/format non-utf8 files.

tree-sitter used to effectively requires UTF-8 or UTF-16, but as of extremely recently it gained support for custom encodings, but I doubt we really want to get in the game of doing that. https://github.com/tree-sitter/tree-sitter/pull/3833

I imagine if we did anything it would be:

Read in as OSString with some locale
Convert to UTF-8 as soon as possible
Parse/Format in UTF-8
Convert back to original locale

But that sounds tricky to get right

I imagine this is:

A non issue for Mac and Linux
A super minor issue for Windows, where 99.9% of the time users have UTF-8 files, but 0.1% of the time they've copied in some Latin1 characters into their file from some other system, or from R output. This likely improved on R >=4.2 though, since UTF-8 is now the default on Windows.

posit-dev / air

Decide if we want to support `.R` files containing non-UTF-8 characters #60