swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.31k stars 10.34k forks source link

`CommandLine.arguments` is Empty on Linux #76080

Closed sidepelican closed 1 week ago

sidepelican commented 2 weeks ago

Description

On Linux, CommandLine.arguments sometimes returns an empty array. This issue occurs when certain libraries, such as OpenCV, are linked. In the implementation change of argv acquisition in https://github.com/swiftlang/swift/pull/71885, there seems to be a problem. This implementation relies on the environ pointer to locate argv. However, the environ pointer can change when setenv or putenv is called. The process of locating argv by ArgvGrabber is executed in a C++ global variable constructor, but if setenv is called before this, the process of ArgvGrabber will fail. This issue depends on the initialization order of libraries at link time. When OpenCV is linked, its initialization happens before libswiftCore.so, which causes this problem. You can understand the initialization order by running the executable with LD_DEBUG=libs.

Reproduction

Steps to reproduce the issue using OpenCV are summarized in the README of the following repository: https://github.com/t-ae/empty-args-test

Even without using OpenCV, the following example reproduces this behavior intentionally: https://github.com/t-ae/empty-args-test/tree/without_opencv

Expected behavior

Return an array containing at least one value. For example, [".build/debug/App"]

Environment

Swift version 5.10.1 (swift-5.10.1-RELEASE)
Target: aarch64-unknown-linux-gnu

Additional information

Including the following C++ code in your project's source code can avoid the issue. However, this method might not work in all execution environments.

extern "C" void _swift_stdlib_overrideUnsafeArgvArgc(const char **, int);

void __attribute__((constructor)) overrideSwift(int argc, const char **argv) {
    _swift_stdlib_overrideUnsafeArgvArgc(argv, argc);
}
grynspan commented 2 weeks ago

@al45tair Seems like something's blowing away the "ABI" arguments list you added support for.

al45tair commented 1 week ago

The process of locating argv by ArgvGrabber is executed in a C++ global variable constructor, but if setenv is called before this, the process of ArgvGrabber will fail.

This is a known downside of the approach we're using (which we adopted to work around a problem with Docker when using Rosetta). My guess is that OpenCV is either calling setenv() or putenv(), or explicitly altering environ from a global constructor or constructor function — that seems like (a) a very odd thing to be doing, and (b) a bad idea, not least for the reason you mention, namely that the order of execution of constructor functions cannot be guaranteed, so if someone else did a putenv() of the same thing OpenCV is trying to change, it might not see the expected result.

The workaround you mention will work with Glibc (and I think Bionic), but it will not work with Musl, because the latter doesn't pass argc and argv to constructor functions.

The problem we have here is that we want this code to work with Musl, so we can't capture the arguments from a constructor, and we also want to avoid having to wait for a fix for the Rosetta issue that was breaking x86_64 Swift programs running in Docker containers on Apple Silicon machines. Once the latter is fixed, we might be able to do something else instead, but not until then.

I'd recommend that you try to get OpenCV to not alter the environment from a global constructor/constructor function, or, if Musl compatibility isn't an issue for you, you could use your workaround above for now. I'm going to close this because there really isn't anything we can do; whatever choice we make, something is broken.

al45tair commented 1 week ago

Note: I'm not unsympathetic here; the problem is that we're boxed in by the decision to not capture arguments in the main function (which is the only place it's actually safe to do so, on Linux). I'll take a look and see if I can work out why OpenCV is changing the environment; it may be that you can turn that off somehow.

al45tair commented 1 week ago

The problem appears to be something (and I don't think it's OpenCV itself, but rather something it pulls in) setting ZES_ENABLE_SYSMAN=1 in the environment, from a constructor function. If you run your program with ZES_ENABLE_SYSMAN=1 already set, that should work, I believe.

al45tair commented 1 week ago

Looks like it's libhwloc.so that's doing the dirty on us here; it looks like it's setting ZES_ENABLE_SYSMAN here; it should probably be using zesInit() instead, but apparently until recently some Intel drivers implemented zesInit() but made it fail :-(

I filed https://github.com/open-mpi/hwloc/issues/687 against OpenMPI's hwloc project to see if they will fix their code to not do this.

al45tair commented 1 week ago

Just to highlight the workaround, if you do

ZES_ENABLE_SYSMAN=1 swift run

or

ZES_ENABLE_SYSMAN=1 .build/debug/App

or similar, things should just work. It sounds like the libhwloc developers aren't very happy about the environment variable thing either but don't have any other option at present for other reasons.

sidepelican commented 1 week ago

Your insights were very helpful. Thank you!