rust-lang / cc-rs

Rust library for build scripts to compile C/C++ code into a Rust library
https://docs.rs/cc
Apache License 2.0
1.81k stars 439 forks source link

spawn() assumption of UTF-8 output may not hold on Windows #424

Open indygreg opened 5 years ago

indygreg commented 5 years ago

(This issue is derived from https://github.com/indygreg/PyOxidizer/issues/105.)

https://github.com/alexcrichton/cc-rs/blob/85bc5b9fce81177ed8024e5241136417c6008db8/src/lib.rs#L2377-L2413 creates a new process, reads its output as a byte stream, then proceeds to re-emit each line via std::io::stdout().write_all().

Unfortunately, this may not just work on Windows. That's because on Windows, Rust's stdio streams enforce that written bytes from Rust are UTF-8 when things are operating in console mode (https://github.com/rust-lang/rust/blob/03f19f7ff128a3b01eeab3f87f04cce22883f006/src/libstd/sys/windows/stdio.rs#L68).

If we invoke a process that emits bytes that aren't UTF-8 (say cl.exe emitting a localized warning message when the system code page isn't UTF-8), spawn() will proxy these non UTF-8 bytes to Rust's stdio handler, which will summarily reject the bytes. The impact of this bug is that Windows users not using a system code page and localization that doesn't emit UTF-8 will not be able to use the cc crate under certain use cases.

The workaround is for spawn() to convert the output bytes to UTF-8 to placate Rust's standard library. @luser has pointed me at https://github.com/mozilla/sccache/blob/6ba6f6c15c106768b914a7697a763e2232fa253a/src/compiler/msvc.rs#L154 as an example of such code.

alexcrichton commented 5 years ago

Thanks for the report! This definitely sounds like a bug, but this is likely also a bug in Cargo as well as cc. This backtrace for example is likely from Cargo since git_packbuilder_object_count (from libgit2) is likely not linked into the same executable as cc-the-crate.

This is definitely going to be a really tricky one to fix, and it's something I'd prefer to ideally not have a ton of code for handling in this crate and instead just have something which defers the problem to elsewhere...

luser commented 5 years ago

This definitely sounds like a bug, but this is likely also a bug in Cargo as well as cc.

I just spent a few minutes poking around the cargo source and I don't believe this is true. I don't know exactly what command the folks in the PyOxidizer issue were running but they both mentioned invoking pyoxidizer [build|run] so that should be independent of cargo. PyOxidizer does link with the git2 crate: https://github.com/indygreg/PyOxidizer/blob/9208ea2e9840d930d8e72bf10ef34b64a917fb9c/pyoxidizer/Cargo.toml#L29

For reference, I believe everything in cargo that takes the output of a subprocess and sends it to stdout/stderr uses exec_with_streaming, which runs the raw bytes through String::from_utf8_lossy, which won't correctly display non-UTF-8 output but also won't error, which is probably good enough: https://github.com/rust-lang/cargo/blob/caba711ce98a88dac6154761872620c7c5e503b6/src/cargo/util/process_builder.rs#L258

Honestly pulling that whole exec_with_streaming + pipe2 implementation out into a standalone crate would not be a bad idea. You could then use that in cc and avoid this issue. With some additional work to use the local-encoding crate on Windows to faithfully transcode process output from the local Windows codepage to UTF-8 that'd be a super useful bit of functionality.

FWIW, we struggled with streaming stdout/stderr from child processes on Windows in Firefox automation code for years and never managed to get a solution that worked well, mostly because Python uses CreatePipe for stdout/stderr handles on Windows and you can't use those for async I/O, but Rust is nice enough to use named pipes.

indygreg commented 5 years ago

pyoxidizer [build|run] can both invoke cargo under the hood. In addition, they also can call into the cc and git2 crates. Note that the cargo invocations are via new processes, not into the cargo Rust crate directly.

I would go a step further with breaking the exec_with_streaming + pipe2 implementations into a standalone crate: I would include the functionality in the Rust standard library. As a systems programming language, I would expect Rust to be able to interop with the native stream output facilities of Windows. And handling basic encoding support is critical to that. I understand a full solution may not be feasible, as it may require a more advanced encoding library than what the stdlib can support. But I feel like there should be something in the stdlib to avoid boilerplate conversion.

alexcrichton commented 5 years ago

Good points! I wonder though if perhaps it's best to just do what Cargo already does by using String::from_utf8_lossy? The output from this crate specifically isn't necessarily 100% critical all the time, especially because it looks like the error in the original issue was just a warning that failed to get printed?

@luser I'm curious, did y'all ever try out something like https://github.com/rust-lang/rust/pull/62021 setting VSLANG? The contributor there claimed that it fixed issues for them, and that'd be an easy-ish solution for this crate as well.

As for putting the functionality on crates.io and/or in libstd, I don't really disagree with either :). I probably don't have the time myself to maintain that though.