rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.82k stars 12.77k forks source link

Command-line arguments are cloned a lot on Unix #47164

Open mbrubeck opened 6 years ago

mbrubeck commented 6 years ago

The std::sys::unix::args module does a lot of allocation and cloning of command-line parameters:

  1. On startup, std::sys::unix::args::init copies all of the command-line arguments into a Box<Vec<Vec<u8>>> (except on macOS and iOS).
  2. When std::env::args or args_os is called, it eagerly copies all of the args into a new Vec<OsString>.

On non-Apple systems, this means there is at least one allocation and clone per argument (plus 2 additional allocations, for the outer Vec and Box) even if they are never accessed. These extra allocations take up space on the heap for the duration of the program.

On both Apple and non-Apple systems, accessing any args causes at least one additional allocation and clone of every arg. Calling std::env::args more than once causes all arguments to be cloned again, even if the caller doesn't iterate through all of them.

On Windows, for comparison, each arg is cloned lazily only when it is yielded from the iterator, so there are zero allocations or clones for args that are never accessed (update: at least, no clones in Rust code; see comments below).

steveklabnik commented 6 years ago

Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says

It's really a shame that Windows command line parsing requires you to allocate memory. This means that to have a cross-platform API for command line arguments, even though in POSIX it can never fail, we have to handle the possibility because of Windows.

and i was wondering about our code regarding this.

mbrubeck commented 6 years ago

Some ideas on reducing each source of allocation/cloning (on startup, and on iterator construction):

  1. Copying on startup could be avoided by storing the argc and argv values that init receives from the OS, instead of cloning their contents. This could change behavior for programs that use unsafe platform-specific code to access these values directly and mutate them, then later call std::env::args. But such programs already behave inconsistently between different platforms (e.g. macOS versus Linux).

  2. Eager copying when constructing the Args iterator could be replaced by lazy cloning during iteration, as we already do on Windows. This requires that the data it clones from is guaranteed to last for the duration of the iterator (which has a 'static type, so it could be up to the duration of the program). This should be safe for data that is created in init and destroyed in cleanup as in the current non-Apple Unix implementation, since cleanup runs after catch_unwind(main). For data owned by the OS, it again can be affected by unsafe-platform-specific code that mutates this data directly, but again I argue that such programs already have poorly-specified behavior.

mbrubeck commented 6 years ago

Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says

It's really a shame that Windows command line parsing requires you to allocate memory.

Ah, yes. On Windows the Args iterator constructor calls CommandLineToArgvW which allocates an array of pointers and a single UTF-16 buffer to hold a copy of the args. So while it doesn't allocate and clone each arg individually, it does do 1 or 2 allocations, and copies the whole command line as UTF-16.

We definitely can't get to zero copies on Windows, because we at least need to do UTF-16 to UTF-8 conversion.

mbrubeck commented 6 years ago

47165 eliminates the allocations/copies on startup.

madsmtm commented 3 months ago

Triage: fixed by https://github.com/rust-lang/rust/pull/47165

And specifically not related to macOS: @rustbot label -O-macos