uutils / coreutils

Cross-platform Rust rewrite of the GNU coreutils
https://uutils.github.io/
MIT License
17.8k stars 1.28k forks source link

quoting_style: Add support for non-UTF-8 bytes #6882

Open jtracey opened 16 hours ago

jtracey commented 16 hours ago

This adds support for non-UTF-8 bytes in the quoting_style library on Unix platforms. This is necessary for proper support of non-unicode inputs in a few utilities, including wc, ls, and printf (as of this PR, wc should be good, ls is in a much better state but will need some work to close the final gaps, and printf needs @andrewliebenow's #6812, which might conflict this this, but if so, should be a quick fix).

The first commit bumps the MSRV, because we need access to Utf8Chunks, since we need to operate on strings and non-unicode bytes in the same OsString (namely, we need to be able to tell if something is invalid unicode, or valid unicode but a control character, and apply the appropriate escaping). Avoiding that would require implementing or using another UTF-8 parser.

The third commit fixes a preexisting bug that was in some sense independent of this patch set (multi-byte control characters weren't being handled properly), but it touches the same code so I'm including it.

github-actions[bot] commented 15 hours ago

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)