rust-lang / rustup

The Rust toolchain installer
https://rust-lang.github.io/rustup/
Apache License 2.0
6.16k stars 884 forks source link

Manipulate shell scripts as bytestrings #2431

Open workingjubilee opened 4 years ago

workingjubilee commented 4 years ago

canonical_cargo_home is a bit of an odd function - it transforms any non-clean unicode characters in a bid to make the path printable. This matters when e.g. folk have paths on disk that are in SHIFT-JIS but their encoding is in CP1252 or other incompatible combinations.

So rather than using this function designed for display, call cargo_home()?.join("bin").to_str(). Factoring out a helper for that is just about a common enough thing to be worth doing, but I haven't gotten around to it yet.

If to_str() fails, then we should fail to write the env file for now (as it would be written with bogus content). A more sophisticated iteration of this work would be to write the env file as a bytestring, not unicode, so that we can write it even if cargo_home isn't valid unicode. I don't know if you want to tackle that right now or not.

Originally posted by @rbtcollins in https://github.com/rust-lang/rustup/pull/2387/threads/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMjgyNjc4MTE0OnYy/unresolve

Rustup uses several instances of lossy UTF-8 conversion when handling the shell scripts it interacts with (for modifying $PATH, mostly). As of #2387, these instances will become errors instead for the most part. This should ideally be handled more correctly and the raw bytes should be manipulated instead (preferring simpler UTF-8 manipulations first where possible, of course).

workingjubilee commented 4 years ago

This will probably be somewhat easier with bstr as a dependency.