rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
670 stars 124 forks source link

`test -x $(bin_dir)/$$base` does not work when building `diffutils-3.10` #405

Closed smalltalkman closed 4 months ago

smalltalkman commented 4 months ago

I'm getting the following error when building diffutils-3.10:

Making all in man
   GEN diff.1
make: (makefile:1875): failed to build 'diff.1' exit 1
make: (makefile:1758): failed to build 'all-recursive' exit 1

The executed command is:

base=`expr diff.1 : '\(.*\).1'` \
   && test -x ../src/$base \
   && (echo '[NAME]' \
               && sed 's@/\* *@@; s/-/\\-/;s/^GNU //; q' ../src/$base.c) \
      | PATH="../src;$PATH" \
        ./help2man -i - -i ./$base.x \
          -S 'diffutils 3.10' $base > diff.1-t && mv diff.1-t diff.1

Due to the suffix of executable files on Windows, it returns false when executing test -x ../src/diff. Returns true if using cygwin or msys2. Can busybox-w32 provide behavior similar to test -x in msys2? test -x ../src/diff returns true when the diff.exe file exists.

rmyorston commented 4 months ago

I'm rather unwilling to follow things MSYS2 does. Not least because what it does seems to be confusing and inconsistent.

In contexts where a file is to be executed busybox-w32 does, of course, allow executables to be specified without an extension. It checks for the usual Windows extensions, .exe, .com, .bat and .cmd, as well as the common Unix convention of shell scripts having the .sh extension.

test -x FILE isn't such a context. The file isn't going to be executed. test -x on Unix is answering the question "does FILE exist and does it have execute permissions?" It doesn't need to be concerned about extensions.

When MSYS2 runs the same test it behaves as though the question was "if the system is asked to execute FILE, will it succeed?" The response tells us nothing about whether a file called FILE actually exists. It might not.

Indeed, MSYS2 goes further:

$ touch diff.exe
$ test -x diff; echo $?
0
$ test -f diff; echo $?
0
$ test -e diff; echo $?
0

Tests for the existence of diff succeed, even though there's no such file. And having conceded the existence of the file it goes further still and allows utilities to behave as though it really does exist:

$ ls -l diff
-rwxr-xr-x 1 rmy None 0 Apr 19 11:04 diff
$ md5sum diff
d41d8cd98f00b204e9800998ecf8427e *diff

But having gone this far for files with the .exe extension it draws back from supporting the other standard extensions:

$ test -x c:/Windows/System32/more; echo $?
1
$ touch diff1.bat diff2.cmd
$ test -x diff1; echo $?
1
$ test -x diff2; echo $?
1

Supporting the MSYS2 behaviour of test -x is certainly possible and it would make this particular issue go away. But I'm concerned what other parts of MSYS2's behaviour might be required to handle future issues.

smalltalkman commented 4 months ago

I'm rather unwilling to follow things MSYS2 does. Not least because what it does seems to be confusing and inconsistent.

The intention is not to implement everything MSYS2 does, but to have test -x ../src/diff return true when the diff.exe file exists. Such a solution can reduce the difference between sh scripts on windows and linux. After all, executable files on Windows will have an exe suffix, but not on Linux. This is very helpful when compiling diffutils-3.10 and similar software.

avih commented 4 months ago

When MSYS2 runs the same test it behaves as though the question was "if the system is asked to execute FILE, will it succeed?"

True, but one could argue that typically this is the question which the user actually want answered when using test -x, even if that's not what the posix semantics suggests it answers.

However, this is indeed a slippery slope which probably requires general filesystem access translation, and not specific to test at all.

And indeed, busybox-w32 already has such generic file-resolution extension procedure, when executing a command, then it may check various viable extensions, similar to what cmd.exe would do when running a command name without file extension.

I would guess that extending this prodecure to all file (read) access, even when not trying to execute it, can make many scripts work better, including the OP use case.

The big tradeoff of this approach is that an extension-less name would become indistinguishable from the resolved name+ext, and it would not be possible to check whether the extension-less name actually exists on disk.

Another big issue is when writing to such name. I don't think this procedure should be performed when writing, but maybe a subset of commands, like cp, install, and maybe few more, would know to preserve the extension (if it's not specified), so e.g. cp foo bar will resolve foo according to these rules, and if it has an extension, then it would add it to bar too.

I don't know how easy it would be to make it reasonably consistent, and concise expressiveness would definitely get hurt along the way.

Maybe a special "approximate file extension" mode would be useful, but I don't think it should be the default.

Also, someone has to actually write that code...

rmyorston commented 4 months ago

so e.g. cp foo bar will resolve foo according to these rules, and if it has an extension, then it would add it to bar too.

And so it seems to do. Following on from my previous example:

$ cp diff duff
$ ls -l d?ff
ls: cannot access 'd?ff': No such file or directory
$ ls -l d?ff.exe
-rwxr-xr-x 1 rmy None 0 Apr 19 11:04 diff.exe
-rwxr-xr-x 1 rmy None 0 Apr 19 14:41 duff.exe
$ ls -l diff duff
-rwxr-xr-x 1 rmy None 0 Apr 19 11:04 diff
-rwxr-xr-x 1 rmy None 0 Apr 19 14:41 duff

Oh my!

avih commented 4 months ago

Oh my!

Duh!

Obviously, this should apply to globs too!

avih commented 4 months ago

For what it's worth, I do maintain a not-huge patch which makes /x/... an "alias" to x:/... (and /x an "alias" to x:/, but I don't think I use that part much), which might be of a similar scope to what might be required for this extension approximation.

The patch is implemented as a single line added in few path access functions, to mechanically "normalize" the input path according to this rule (possibly into an automatic buffer).

An implementation of file extension approximation, at least as far as reading goes, might be similar, except that this line would also try to resolve an extension.

This is the first part which only handles /x/... - https://github.com/avih/busybox-w32/commit/5158b04df93de71b9fd7347b8c9bf212d5b7e284

It's not necessarily the ideal implementation, but for now that's what I have, and I have been using it for more than a year, and access paths using the /x/... form all the fime.

IIRC it never had a merge conflict, and I didn't modify it since I wrote it, so "maintenance" required zero effort so far.

Just FYI.

smalltalkman commented 4 months ago

I think the measures taken by msys2 (and probably cygwin of course) when dealing with test -x are advisable.

The result returned by test -x must contain two levels of meaning: the file must exist and the file must be executable.

In terms of "the file must be executable", windows and Linux should be handled in the same way. However, in terms of "the file must exist", there are obvious differences in the processing methods of Windows and Linux. This is mainly reflected in the fact that executable files under Linux have no extension, while under Windows there is an exe extension. msys2 hides this major difference in its handling of test -x.

However, test -x should not expand to other executable file extensions, which is consistent with Linux behavior:

$ ll | grep devbuild
-rwxrwxr-x  1 dihuge dihuge  1031 1月  29 16:13 devbuild.sh*

$ test -x devbuild && echo 'foo'

$ test -x devbuild || echo 'foo'
foo

In addition, I have not encountered extension problems with cp, install and other commands when transplanting Linux software.

Given the philosophy of busybox-w32, I'm closing this question. After all, the obligation to handle extensions is not the sole responsibility of the busybox-w32 project alone. Also, dealing with extensions in scripts is a tricky problem.

rmyorston commented 4 months ago

@smalltalkman Thank you for your understanding.

@avih Thanks for the link to your /x/... support. This isn't something I plan on adding to busybox-w32.

smalltalkman commented 4 months ago

Unfortunately, I ran into new problems with the exe extension when compiling and testing m4-1.4.19. There are 2 test-closein related files in the tests folder:

# ll | grep test-closein
total 348
-rwxrwxr-x    1 root     root        348357 Apr 21 18:22 test-closein.exe
-rwxrwxr-x    1 root     root           995 Apr 21 18:54 test-closein.sh

The content of the test-closein.sh file is:

# cat test-closein.sh
#!/bin/sh
: ${srcdir=.}
. "$srcdir/init.sh"; path_prepend_ .

echo Hello world > in.tmp
echo world > xout.tmp

fail=0
# Test with seekable stdin; follow-on process must see remaining data
(${CHECKER} test-closein; cat) < in.tmp > out1.tmp || fail=1
cmp out1.tmp in.tmp || fail=1

(${CHECKER} test-closein consume; cat) < in.tmp > out2.tmp || fail=1
cmp out2.tmp xout.tmp || fail=1

# Test for lack of error on pipe.  Ignore any EPIPE failures from cat.
cat in.tmp 2>/dev/null | ${CHECKER} test-closein || fail=1

cat in.tmp 2>/dev/null | ${CHECKER} test-closein consume || fail=1

# Test for lack of error when nothing is read
${CHECKER} test-closein </dev/null || fail=1

${CHECKER} test-closein <&- || fail=1

# Test for no error when EOF is read early
${CHECKER} test-closein consume </dev/null || fail=1

# Test for error when read fails because no file available
${CHECKER} test-closein consume close <&- 2>/dev/null && fail=1

Exit $fail

When I query test-closein with which, test-closein.sh is returned. This means that test-closein in the script executes test-closein.sh, causing an infinite loop, which does not occur in Linux.

# export PATH=$PATH:.

# which test-closein
./test-closein.sh

# which -a test-closein
./test-closein.sh

In order for the test to run properly, 8 exe file suffixes must be added. Similar situations also appear in other files such as test-getcwd.sh, test-perror.sh.

Although this kind of problem is only encountered when testing m4-1.4.19, it is too pessimistic. This means that a large number of exe file suffixes need to be added during testing to carry out testing. Of course, it also forces people to use other platforms to recompile and test.

avih commented 4 months ago

Thanks for the link to your /x/... support. This isn't something I plan on adding to busybox-w32.

I know. It was a reference and possibly a starting point with likely similar scope to what supporting approximated file extension might require, in case someone wants to experiment.

Note that I rebase that branch occasionally on top of this repo, though this view should always show the patches which I maintain.

ale5000-git commented 4 months ago

When I query test-closein with which, test-closein.sh is returned. This means that test-closein in the script executes test-closein.sh, causing an infinite loop, which does not occur in Linux.

@rmyorston Probably we should change the priority of exe/com to be over sh. Since on Linux binaries usually have no extension but on Windows they have the exe/com extensions, when a script refer to some files without extension maybe exe/com is more likely to be the correct result.

rmyorston commented 4 months ago

Probably we should change the priority of exe/com to be over sh.

Probably. Though anyone who depends on the current behaviour may not be happy.

Still, the latest prerelease (PRE-5330) makes this change. Let's see how it goes.

This prerelease also makes handling of files without an extension more consistent. busybox-w32 can run executables and shell scripts with no extension. If a command is specified with no extension and a corresponding file exists with no extension, it's run in preference to any similar file with an extension.

avih commented 4 months ago

I see that it uses a fixed list of extensions, and sh is in the middle now.

However, I think at least the windows part of the list should be taken from $PATHEXT. On my system its value is

.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC

And possibly also the busybox extentions (sh and shebang-based?) should come after the PATHEXT values?

Also, I didn't try, so it might already do that, but I think that running an extension-less name should match exact extension-less file before trying to add extensions.

smalltalkman commented 4 months ago

Still, the latest prerelease (PRE-5330) makes this change. Let's see how it goes.

Thanks, it saved me more than 40 modification operations.

avih commented 3 months ago

However, I think at least the windows part of the list should be taken from $PATHEXT

What about this?

I think PATHEXT was mentioned few times in the past, but I don't think a reason was given to not use it.

ale5000-git commented 3 months ago

@avih

It wouldn't be useful: .JS;.JSE are related to JavaScript .VBS;.VBE are related to VBScript

Also the other extensions are probably Windows specific things that BusyBox is unable to execute.

rmyorston commented 3 months ago

Previously mentioned in issue #126.