Open mbrubeck opened 6 years ago
Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says
It's really a shame that Windows command line parsing requires you to allocate memory. This means that to have a cross-platform API for command line arguments, even though in POSIX it can never fail, we have to handle the possibility because of Windows.
and i was wondering about our code regarding this.
Some ideas on reducing each source of allocation/cloning (on startup, and on iterator construction):
Copying on startup could be avoided by storing the argc
and argv
values that init
receives from the OS, instead of cloning their contents. This could change behavior for programs that use unsafe platform-specific code to access these values directly and mutate them, then later call std::env::args
. But such programs already behave inconsistently between different platforms (e.g. macOS versus Linux).
Eager copying when constructing the Args
iterator could be replaced by lazy cloning during iteration, as we already do on Windows. This requires that the data it clones from is guaranteed to last for the duration of the iterator (which has a 'static
type, so it could be up to the duration of the program). This should be safe for data that is created in init
and destroyed in cleanup
as in the current non-Apple Unix implementation, since cleanup
runs after catch_unwind(main)
. For data owned by the OS, it again can be affected by unsafe-platform-specific code that mutates this data directly, but again I argue that such programs already have poorly-specified behavior.
Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says
It's really a shame that Windows command line parsing requires you to allocate memory.
Ah, yes. On Windows the Args
iterator constructor calls CommandLineToArgvW
which allocates an array of pointers and a single UTF-16 buffer to hold a copy of the args. So while it doesn't allocate and clone each arg individually, it does do 1 or 2 allocations, and copies the whole command line as UTF-16.
We definitely can't get to zero copies on Windows, because we at least need to do UTF-16 to UTF-8 conversion.
Triage: fixed by https://github.com/rust-lang/rust/pull/47165
And specifically not related to macOS: @rustbot label -O-macos
The
std::sys::unix::args
module does a lot of allocation and cloning of command-line parameters:std::sys::unix::args::init
copies all of the command-line arguments into aBox<Vec<Vec<u8>>>
(except on macOS and iOS).std::env::args
orargs_os
is called, it eagerly copies all of the args into a newVec<OsString>
.On non-Apple systems, this means there is at least one allocation and clone per argument (plus 2 additional allocations, for the outer
Vec
andBox
) even if they are never accessed. These extra allocations take up space on the heap for the duration of the program.On both Apple and non-Apple systems, accessing any args causes at least one additional allocation and clone of every arg. Calling
std::env::args
more than once causes all arguments to be cloned again, even if the caller doesn't iterate through all of them.On Windows, for comparison, each arg is cloned lazily only when it is yielded from the iterator, so there are zero allocations or clones for args that are never accessed (update: at least, no clones in Rust code; see comments below).