skeeto / w64devkit

Portable C and C++ Development Kit for x64 (and x86) Windows
The Unlicense
3k stars 211 forks source link

w64devkit and GNU Autotools #50

Open rmyorston opened 1 year ago

rmyorston commented 1 year ago

Over at https://github.com/rmyorston/busybox-w32/issues/297 we've been working on getting stuff that uses a configure script to build with w64devkit.

With a bit of hackery I've managed to get Expat to build. (Chosen as a guinea pig because it's included with w64devkit and it's quite small.)

GNU Autotools support for Windows platforms is mostly restricted to Cygwin/MSYS2. This sort of works for w64devkit (hence the hackery) but what's really needed is proper support for the w64devkit build environment.

Is there any interest in prodding the GNU Autotools people to do this?

avih commented 1 year ago

I think it would be great if autotools could work using busybox-w32 shell and w64devkit tools.

GNU Autotools support for Windows platforms is mostly restricted to Cygwin/MSYS2. This sort of works for w64devkit...

What actually doesn't work? I presume semicolon path separator is not great, and/or possibly colon at absolute paths, and/or absolute paths don't necessarily begin with slash? (and something with uname?)

It was mentioned before, but allowing some prefix as virtual root and make all the paths normal posix paths can help a lot in many things which don't expect windows path [separator], in addition to taking colon path separator.

Something like /bbdrive/c/whatever, similar to /cygdrive/...

The git-for-windows busybox-w32 downstream fork already has such changes (check the busybox.exe binary inside one of the "MiniGit-2.xxx-busybox...zip" downloads of the Git project).

skeeto commented 1 year ago

Interesting, what did it take to get the Expat build working? In the past I've only observed configure scripts immediately fail and gave up on the idea. I tried it just now with Expat using FRP-4882-g6e0a6b7e5, and after supplying --build it does indeed seem, as avih mentioned, semicolon path separation is the first issue, based on the output from running the configure script with "sh -x". I guess you either faked a colon-delimited PATH for the configure script or hacked something into busybox-w32?

It would be nice, especially if it opens the doors to bootstrapping more of w64devkit on itself, which is currently limited to busybox-w32, Vim, Universal Ctags, Cppcheck, PDCurses, GNU Make, and u-config. It's a lot easier to debug issues in tools I can build natively, so I don't need to go back and forth cross-compiling between tests. (This has been a serious impediment debugging GDB itself, for instance.) Otherwise it's not a big deal for me personally.

Is there anything we could do the other way, to make the environment more tenable for Autoconf scripts? It would be convenient if existing configure scripts worked rather than wait for Autoconf support to percolate, which would take years. The Autoconf on my laptop is over a decade old (2.69), and I'm running the latest Debian, so that's potentially a very long time to wait. This adaptation could perhaps take the form of an ash Autoconf compatibility mode, or a shell wrapper through which configure scripts are invoked (including recursive invocations).

$ sh --autoconf ./configure --prefix=...
$ acrun ./configure --prefix=...

I imagine these would, at the very least, provide a colon-delimited PATH for just the script itself, but not for any programs it runs besides shell built-ins. Maybe it would even need to virtualize the filesystem into something more unixy just for the script? Though at that point maybe it's just time to use MSYS2/Cygwin instead.

rmyorston commented 1 year ago

The details are in the busybox-w32 issue mentioned above. The tl;dr is that it's a stock w64devkit but you need to source this script before running configure:

export PATH_SEPARATOR=';'
export ac_executable_extensions='.exe'
export build_alias="$(uname -m)-pc-mingw64"

if [ -f configure ]
then
    echo "Converting configure..."
    sed -i 's/func_convert_file_msys_to_w32/func_convert_file_noop/' configure
fi
skeeto commented 1 year ago

Doh! I went diving in and forgot you had provided a link.

It seems perfectly reasonable to set these three variables as indicated in w64devkit.exe if not already set. The first one seems like a good idea even without considering Autoconf. I found I could skip the sed step by also exporting lt_cvto{host,tool}_file_cmd to that noop. If that works generally then all can be accomplished using environment variables.

skeeto commented 1 year ago

I just tried Binutils, and it has func_to_host_path{list,} that hardcodes two more "cmd //c" but this time behind a mingw case at "run time". If I fix them manually (across each libtool copy) then it builds successfully.

I wonder if there's a more general solution: a cmd.exe in w64devkit that filters any "//c" before invoking the real cmd.exe. Or alternatively do it in ash.

skeeto commented 1 year ago

I whipped one up to try it out: cmd.c. With that cmd.exe on my path, and the three environment variables exported, I could straightforwardly build m4 (note: disable fortify), GMP, MPFR, and MPC. GCC started to build but failed in the middle looking for sys/wait.h (i.e. a misconfiguration to be sorted out).

avih commented 1 year ago

I whipped one up to try it out: cmd.c.

// Converts libtool's Cygwin-style "cmd //c ..." to "cmd /c ..."
//   $ cc -nostartfiles -o cmd.exe cmd.c

This is not a cygwin thing, it's an MSYS2 thing.

In cygwin arguments are unmodified, and if you want to pass a Windows path to a Windows utility then you should convert the path yourself, using cygpath, similar to this:

$ /cygdrive/c/Windows/notepad.exe "$(cygpath -w /cygdrive/c/foo.txt)"

In MSYS2. however, they want to make it easier to integrate with Windows programs, so they do (at least these) three hacks:

This behavior is baked into MSYS2's exec* C interfaces, and not only a bash hack. So if you compile a different shell it would still behave the same.

This can be tested with the following args.c when compiled as a windows binary args.exe:

#include <stdio.h>

int main(int argc, char **argv) {
    while (*argv)
        printf("[%s]\n", *argv++);
    return 0;
}

and

$ ./args.exe foo /c /foo //bar foo/bar
[D:\run\utils\args.exe]
[foo]
[C:/]
[T:/msys64/foo]
[/bar]
[foo/bar]

However, without a windows program (or with a native MSYS2 binary) the args are unmodified:

$ printf "[%s]\n" foo /c /foo //bar foo/bar
[foo]
[/c]
[/foo]
[//bar]
[foo/bar]

All this is very unfortunate together with the (traditional) windows switches which use /whatever convention. That's also the main reason I switched from MSYS2 to cygwin - it's much more pure and without these hacks (even if not as elaborate when it comes to mingw packages).

I'm guessing they manage to upstream this behavior into autotools (or only their autogen?), based on the uname value, when it presumably indicates an MSYS2 MINGW environment (not sure about "native" MSYS2 env).

rmyorston commented 1 year ago

I'm guessing they manage to upstream this behavior into autotools (or only their autogen?), based on the uname value

Indeed. uname -s reports MINGW64_NT-10.0-19044 in a mingw64 environment and MINGW32_NT-10.0-19044 in mingw32. Cygwin says CYGWIN-10.0-19044. These values are recognised by autotools.

For comparison, busybox-w32 uname -s says Windows_NT. The string can be changed using the configuration option CONFIG_UNAME_OSNAME (one of my self-serving upstream contributions). w64devkit could, perhaps, use that capability to masquerade as mingw64/32 and avoid the need to set build_alias. (EDIT: sorry, that's nonsense. The Windows_NT string is hardcoded in win32/uname.c.)

@skeeto Excellent progress on those builds! This looks very promising.

avih commented 1 year ago

I do wonder though why they use cmd to begin with?

Is there some functionality which cmd.exe can do and standard posix (/busybox/cygwin/msys2) tools cannot?

Presumably, if they apply the cmd //c ... MSYS2 hack then they do know that the environment should have the standard tools (like rm instead of DEL, etc).

So why cmd at all?

ndeed. uname -s reports MINGW64_NT-10.0-19044 in a mingw64 environment and MINGW32_NT-10.0-19044 in mingw32. Cygwin says CYGWIN-10.0-19044. These values are recognised by autotools.

It's a shame that MSYS2 upstreamed "MINGW" prefix and associated it with hacked paramaters mangling.

It really should have been something like "MSYS2-MINGW...", because not all MINGW setups (current or future) would necessarily hack the arguments like MSYS2 does...

rmyorston commented 1 year ago

Presumably, if they apply the cmd //c ... MSYS2 hack then they do know that the environment should have the standard tools (like rm instead of DEL, etc). So why cmd at all?

I was puzzled by that too. The only cases I've seen where this is used are of the form:

( cmd //c echo "$VAR" ) 2>/dev/null | sed ...

A subshell to run cmd to echo a variable?

This message from the author of the patch which added the cmd says:

You'd end up calling func_msys_to_mingw, which relies on the msys "magic" path
conversion logic:

( cmd //c echo "$1" )

MSYS (but not cygwin) notices that cmd is a native win32 program, and
that there is a path-like argument "$1". MSYS (but not cygwin) will then
automatically convert $1 to DOS format, before spawning the cmd process
-- which echos the converted path to stdout, where we can grab it. MSYS
(but not cygwin) also turns '//c' into '/c' (the extra slash means
"don't use the MSYS mount table to convert this "path") -- this is how
you pass win32-style switches to native programs. IIRC, there's some
complicated logic to determine whether a given argument that begins with
two slashes is a "switch" like /EHsc or a unix-format SMB path like
//server/share/path/to/file.

It seems cmd is being used to trick MSYS into converting the path to DOS format. We don't need to do that, so if the wrapper detects the //c echo case could it just echo the path itself without invoking cmd at all?

avih commented 1 year ago

It seems cmd is being used to trick MSYS into converting the path to DOS format. We don't need to do that, so if the wrapper detects the //c echo case could it just echo the path itself without invoking cmd at all?

Well, MSYS2 does have cygpath, and converting $PATH itself would work like this (in MINGW32 env):

$ cygpath -p -w -- "$PATH"
T:\msys64\usr\local32\bin;T:\msys64\mingw32\bin;T:\msys64\usr\local\bin;T:\msys64\usr\bin;T:\msys64\usr\bin;C:\Windows\System32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;T:\msys64\usr\bin\site_perl;T:\msys64\usr\bin\vendor_perl;T:\msys64\usr\bin\core_perl

cygpath is quite versatile, and I find it hard to believe that it doesn't cover their needs to convert MSYS2 paths to windows/dos paths (but this brings up the question again - why do they need dos paths at all?).

However, if they did that, which would be the correct way to handle path conversion instead of relying on some magic MSYS2 arguments hacks, then it still won't work in busybox-w32 (unless cygpath is implemented sufficiently for their use patterns - which should be possible I think).

So yeah, being able to detect the usage pattern and possibly replace it automatically with something which works in busybox-w32 should work.

Alternatively, maybe they do have some simple mode which doesn't rely on MSYS2 hacks, which could be enabled/enforced using some env vars, and which would happen to work in busybox-w32 (considering it's paths are indeed non-posix...).

rmyorston commented 1 year ago

Alternatively, maybe they do have some simple mode which doesn't rely on MSYS2 hacks,

Cygwin doesn't depend on MSYS2 hacks.

Using build_alias=x86_64-pc-cygwin works to build Expat. It doesn't need to work around the cmd //c issue. But you end up with cygexpat-1.dll rather than libexpat-1.dll.

avih commented 1 year ago

Using build_alias=x86_64-pc-cygwin works to build Expat. It doesn't need to work around the cmd //c issue. But you end up with cygexpat-1.dll rather than libexpat-1.dll.

Hmm.. that's a bit hacky I think, and it's possible IMHO that some things could break either during configure or the actual build later, even if they didn't for expat.

However, it seems that, at least for expat, the env vars and the cmd //c thing are enough, so a script like this, saved at the PATH as e.g. bbconf or w64dkconf or whatever, can streamline this conversion without actually touching the original configure file, and also supports arguments, e.g.

bbconf ./configure --host=whatever --prefix=whatever

The script:

#!/bin/sh

# run autotools "configure" script in w64devkit env, using busybox-w32 sh,
# utilizing the MSYS2 MINGW code paths, and work around the following:
# - export some env vars which bypass the detection by "configure".
# - "cmd //c ..." is replaced with "cmd /c ...". originally //c works around
#   MSYS2 auto args copnversion to get "/c" - not needed with busybox-w32.
# The modification is written to a temporary file at the same dir, which is
# deleted [before startup and] after it's invoked.

echo() { printf %s\\n "$*"; }

error() {
    [ "${1+x}" ] && >&2 echo "${0##*/}: $*"
    >&2 echo "Usage: ${0##*/} path/to/autotools-configure [ARG...]"
    exit 1
}

[ "${1-}" ] || error
[ -e "$1" ] || error cannot find file -- "$1"

conf=$1
bbconf=$conf.bb.tmp
rm -f -- "$bbconf" || error "cannot remove file -- $bbconf"

shift

# The following might be needed too in some cases, but currently not applied
#   's/func_convert_file_msys_to_w32/func_convert_file_noop/'

sed -e 's/cmd \/\/c/cmd \/c/g' < "$conf" > "$bbconf" \
    || error "cannot create file -- $bbconf"
chmod +x "$bbconf"

export PATH_SEPARATOR=';'
export ac_executable_extensions='.exe'
export build_alias="$(uname -m)-pc-mingw64"

"$bbconf" "$@"

e=$?
rm -- "$bbconf"
exit "$e"

This makes it explicit that some conversion is applied.

However, if indeed the //c and exports are enough, then it would be more user friendly to export the stuff unconditionally automatically, and handle the //c thing using some script/binary wrapper.

skeeto commented 1 year ago

Unfortunately fixing the root configure script is often insufficient, which is what lead to my cmd.exe wrapper idea. Namely that approach doesn't handle:

Binutils, for instance, has both. Its root configure script recursively runs 14 other configure scripts. These each generate a "libtool" script via "ltmain.sh" which has the actual hard-coded instance of "cmd //c". A fix-up script would need to be able to figure this out.

avih commented 1 year ago

which is what lead to my cmd.exe wrapper idea

Right. I think it can also be implemented as a shell script, either named cmd or cmd.sh (which invokes cmd.exe, therefore bypassing itself at $PATH).

If the cmd //c thing is the main/only issue, then a wrapper executable (sh/binary) and the exported env should have it covered.

rmyorston commented 1 year ago

Here's a patch to busybox-w32 ash to intercept commands with exactly the form cmd //c echo arg and replace them with the internal echo. Its action is unconditional at the moment. We might want it only to take effect when running a configure script. Or we might not bother.

diff --git a/shell/ash.c b/shell/ash.c
index 742067216..886498640 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -8970,6 +8970,21 @@ static int builtinloc = -1;     /* index in path of %builtin, or -1 */
 static void
 tryexec(IF_FEATURE_SH_STANDALONE(int applet_no,) const char *cmd, char **argv, char **envp)
 {
+#if ENABLE_PLATFORM_MINGW32
+   int a;
+
+   if (strcmp(argv[0], "cmd") == 0 &&
+           argv[1] && strcmp(argv[1], "//c") == 0 &&
+           argv[2] && strcmp(argv[2], "echo") == 0 &&
+           argv[3] && !argv[4] && (a = find_applet_by_name("echo")) > 0) {
+       argv += 2;
+       cmd = "echo";
+#if ENABLE_FEATURE_SH_STANDALONE
+       applet_no = a;
+#endif
+   }
+#endif
+
 #if ENABLE_FEATURE_SH_STANDALONE
    if (applet_no >= 0) {
 # if ENABLE_PLATFORM_MINGW32
avih commented 1 year ago

Here's a patch to busybox-w32 ash ...

I didn't try that, but (together with the exported env vars) I did try a cmd shell script wrapper - which didn't work (./configure completes, make has some errors). I'm guessing it searches for cmd.exe.

So I tried the same shell script, this time saved as cmd.exe:

#!/bin/sh

# save me as "cmd.exe" and place it early in PATH

# autotools assumes MINGW setup uses MSYS[2] env, and uses "cmd //c echo..."
# to invoke "cmd /c echo..." (MSYS converts //c into /c for windows prog args).
# this breaks in busybox-w32 sh, so replace the //c with normal /c

# only handle exactly cmd //c echo...
# could be enhance for more cases, but for autotools this seems enough
# and without breaking "cmd.exe" arguments in general
case ${COMSPEC-} in *\\cmd.exe)
    if [ "${1-}" = //c ] && [ "${2-}" = echo ]; then
        shift
        set -- /c "$@"
    fi
esac

exec "$COMSPEC" "$@"

And this does seem to work.

I'm guessing this could break with unicode paths, but then again, if unicode paths are used then I'm guessing it would break elsewhere too regardless of this cmd.exe wrapper, because currently all busybox-w32 tools and sh don't support unicode paths.

EDIT: busybox-w32 does convert commom env vars to "mixed" paths case (\ into / e.g. in PATH and APPDATA etc), but not in COMSPEC. This is important, because the real cmd.exe doesn't like being invoked with such mixed paths. We could replace / into \ too in COMSPEC before exec as a future-proof thing.

Peter0x44 commented 1 month ago

I tried building gcc 14.2 with w64devkit 2.0 (note: I built gmp, mpfr, mpc on linux and copied them over, I didn't feel like sorting out m4) It failed late in the build in libstdc++, not sure why. It's not all the way there, but it's pretty close. I never would have thought w64devkit would get anywhere, it's cool.