rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
670 stars 124 forks source link

win app invocation: missing windows error dialog #423

Closed avih closed 1 month ago

avih commented 2 months ago

When a windows binary requires a DLL which is missing, trying to run this binary via explorer or cmd.exe results in an error dialog with a message similar this:

---------------------------
foo.exe - System Error
---------------------------
The code execution cannot proceed because bar.dll was not found.
Reinstalling the program may fix this problem. 
---------------------------
OK   
---------------------------

However, when running it from busybox sh, no such dialog is displayed, and it exits quietly with code 53 (not sure if always 53).

Needless to say that it makes it hard to diagnose this issue without the dialog which details what went wrong.

I don't know if there are other error dialogs which are suppressed similarly, but I think yes, because IIRC I once had some installer which failed quietly from busybox sh, until I realized it does display an error message (I don't recall which) - but only if executed from explorer or cmd.exe .

Expected behavior in busybox sh: Display the error dialog. Alternatively, if possible, print the dialog text to stderr.

Example, save this as ret.c:

#ifdef DLL
int ret(int x) { return x; }
#else
int ret(int x);
int main() { return ret(0); }
#endif

Then compile (In w64devkit):

$ cc -shared -Wl,--out-implib=libret.dll.a -DDLL ret.c -o ret.dll
$ cc ret.c -o ret.exe -L. -lret.dll

And try to run it with and without the dll:

$ ./ret.exe; echo $?
0
$ rm ret.dll
$ ./ret.exe; echo $?  # no dialog is displayed
53

Worth noting that running cmd.exe from the busybox sh, and then trying to run ret.exe also does not show an error dialog.

I thought it might be related to some modified env vars, but I got the env from a cmd.exe prompt where the dialog shows (not from busybox sh), created a batch file from it which sets all these values (with the original case-sensitive names), but running this batch inside cmd.exe from busybox-sh and then running ret.exe still did not show a dialog, so I don't think it's related to env vars, but not 100% sure.

I have a vague recollection that this was mentioned in the past, possibly related to busybox sh used in a remote ssh session (so the dialog is not visible and possibly blocks indefinitely), but I can't find it now,

Even if this is intentional and does solve a problem, it definitely also introduces a different problem of missing diagnostic.

avih commented 2 months ago

I think it might be related to this (untested), which applies also to child processes: https://github.com/rmyorston/busybox-w32/blob/9ee8e87d5affe019aeb2b8afe4b908672c4c2fa5/libbb/appletlib.c#L1353-L1355

If it is this, maybe only set this mode temporarily while calling APIs which are known to produce such dialogs on error?

And some processes might need to set it manually after startup, I'm guessing like httpd.

rmyorston commented 2 months ago

The documentation for GetVolumeInformation() clearly says:

When a user attempts to get information about a floppy drive that does not have a floppy disk, or a CD-ROM drive that does not have a compact disc, the system displays a message box for the user to insert a floppy disk or a compact disc, respectively. To prevent the system from displaying this message box, call the SetErrorMode function with SEM_FAILCRITICALERRORS.

And yet, if I comment out the call to SetErrorMode() and run df from the resulting binary with an empty floppy disk drive in my VM, no message box appears. Maybe the message box only appears in GUI applications?

If that's the case the call to SetErrorMode() is unnecessary.

avih commented 2 months ago

Maybe the message box only appears in GUI applications?

It's possible, but we'd need a test program for that.

FWIW, on Windows 7, clicking an empty CD/dvd drive in explorer does pop a dialog (and if it's not virtual it also opens the tray door...), but running from cmd.exe dir x: where X is such drive produces no dialog, and prints "The device not ready.".

ale5000-git commented 2 months ago

These on Windows 10 doesn't produce error but open the cd-rom tray: busybox ash -c "explorer d:\\" explorer d:\ Need test on Windows 7.

Note: D:\ is the cd-rom drive

rmyorston commented 2 months ago

Nope, the message box doesn't appear in a GUI application either. The remark in the GetVolumeInformation() documentation seems to be completely untrue.

I've tried busybox-w32 df without the SetErrorMode(SEM_FAILCRITICALERRORS) call on Windows XP, 8.1 and 10. In no case did an empty removable drive cause the message box to appear; there didn't seem to be any adverse effects; and there was a dialog about the problem with ret.exe.

So, despite the documentation, it seems removing the call to SetErrorMode() is the right thing to do.

avih commented 2 months ago

I've tried busybox-w32 df without the SetErrorMode(SEM_FAILCRITICALERRORS) call on Windows XP, 8.1 and 10. In no case did an empty removable drive cause the message box to appear

I presume it's guaranteed that this command results in a call to GetVolumeInformation of the CD/DVD/FLOPPY volume?

So, despite the documentation, it seems removing the call to SetErrorMode() is the right thing to do.

Empirically true, but I think it's still worth using it when calling GetVolumeInformation (and restoring it later) until we know for a fact that the docs are wrong.

Or indeed remove it completely and count on bug reports to tell you otherwise.

rmyorston commented 2 months ago

I presume it's guaranteed that this command results in a call to GetVolumeInformation of the CD/DVD/FLOPPY volume?

It is. While investigating this I had print statements around the calls to GetVolumeInformation() and it's definitely invoked for empty removable drives.

Empiricism is enough for present purposes, so I've removed the call to SetErrorMode().

If it causes other problems people will complain.

Prereleases are available for those who want them (PRE-5374 or above).

avih commented 2 months ago

What about suppressing it in applets like httpd? I'd imagine it'd be quite undesirable that an error while running something waits for GUI user confirmation...

FWIW, libuv (the portable system library which node.js is/was using) does (did?) disable error dialogs by default.

In general, there's certainly a class of applications where we don't want any confirmation dialogs, probably any kind of service or daemon-like or background-maintenance functionality etc, but I wouldn't know how to automate that.

Or maybe the way to look at it is to only want the dialogs in an interactive session, but then what if the user runs a daemon-like app in the foreground (like httpd)?

The ideal behavior is possibly to print the dialog text to stderr, which should cover both interactive and background use cases, but I don't know if that's possible and/or hard.

rmyorston commented 2 months ago

The ideal behavior is possibly to print the dialog text to stderr, which should cover both interactive and background use cases, but I don't know if that's possible and/or hard.

Possible and not too hard. I've got a somewhat experimental implementation. With prereleases (PRE-5379 or above).

The main drawback is that we don't have all the context available to "the system" when it creates the dialog box. So the message for the original issue is:

~ $ ./ret; echo $?
sh: ./ret.exe: The specified module could not be found. Error 0xc0000135
53

It doesn't report which DLL couldn't be found.

The exit code is obtained by masking off the low order byte of that error code. Hex 35 is decimal 53.

avih commented 2 months ago

Nice. That's definitely much better than no message at all.

It doesn't report which DLL couldn't be found.

But between the dialog and this, I'd presonally prefer the dialog because otherwise the error text is incomplete (and personally my use case is 99% interactive where I'm there to dismiss it).

The textual error is a probably an acceptable alternative to the dialog if it's complete.

Maybe some BB_FOO env to decide between the two, possibly default to text message? (where this env would apply anywhere in bb, e.g. including xargs, or awk or anything else which might spawn a new application).

EDIT: or maybe the textual message should stay regardless, because it doesn't have any negatives that I can think of, but the env would decide whether or not to disable the error dialogs?

rmyorston commented 2 months ago

If BB_CRITICAL_ERROR_DIALOGS is set to 1 error dialogs are enabled. If unset or set to any other value they aren't. Textual messages are issued in either case.

Like a handful of other BB_ variables, BB_CRITICAL_ERROR_DIALOGS takes effect immediately it's changed.

PRE-5380 or above.

avih commented 2 months ago

Nice. Thanks.

We really need the BB_thing vars documented someplace other than the release notes. Even the readme at this repo is not always at one's fingertips (and it doesn't have the dialog docs anyway).

Maybe in busybox --help and/or or ash --help, as appropriate?

ale5000-git commented 2 months ago

I suggest to put this info also directly on the homepage: https://frippery.org/busybox/

avih commented 2 months ago

The homepage or the readme are not readily available unless one downloads it via the homepage.

It's definitely not obvious if installed via some package manager (scoop, chocolaty, etc) or w64devkit, etc.

Online (embedded) docs is guaranteed to be accessible if one looks for it, like when using --help.

ale5000-git commented 2 months ago

I wasn't saying only in the home but "also" in the home, so new users that found it over the internet and users that doesn't use package managers can see the info directly.

skeeto commented 1 month ago

During my FRP-5398 shakedown cruise, today I got one of these new 0xc0000135 error messages and I was very pleased! Even without the extra context, it's better than the usual silence, more so than I realized. Definitely a positive change.

Since I had a test case on hand, I was able to try out BB_CRITICAL_ERROR_DIALOGS, and I think I like it even more enabled, at least in this case. However, I also noticed that it "leaks" into the shell when set for just one command. For example:

$ BB_CRITICAL_ERROR_DIALOGS=1 ./missingdll.exe
$ ./missingdll.exe

If BB_CRITICAL_ERROR_DIALOGS is unset in my shell, then after the first command it will behave as though I set BB_CRITICAL_ERROR_DIALOGS=1 in the shell's environment, even though it's still unset, and I get the dialog again for the second command. The inverse is true with =0. Since the behavior is really a property of the shell, I expected the above to have no effect, which is why I was testing it. Whether or not this particular case should change the error behavior is kind of subjective, but the leak seems objectively incorrect.

rmyorston commented 1 month ago

The code to clean up the environment when the local variable went out of scope was incomplete. Try this:

diff --git a/shell/ash.c b/shell/ash.c
index 9ef8f7742..119c96555 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -10948,6 +10948,9 @@ poplocalvars(int keep)
    struct localvar_list *ll;
    struct localvar *lvp, *next;
    struct var *vp;
+#if ENABLE_PLATFORM_MINGW32
+   int var_type;
+#endif

    INT_OFF;
    ll = localvar_stack;
@@ -10991,8 +10994,17 @@ poplocalvars(int keep)
            vp->flags = lvp->flags;
            vp->var_text = lvp->text;
 #if ENABLE_PLATFORM_MINGW32
-           if (is_bb_var(lvp->text) == BB_VAR_ASSIGN)
+           var_type = is_bb_var(lvp->text);
+           if (var_type == BB_VAR_ASSIGN && (lvp->flags & VEXPORT))
                putenv(lvp->text);
+           else if (var_type) {
+               char *var = xstrdup(lvp->text);
+               char *eq = strchr(var, '=');
+               if (eq)
+                   *eq = '\0';
+               unsetenv(var);
+               free(var);
+           }
 #endif
        }
        free(lvp);
skeeto commented 1 month ago

Your patch fixes the "leak" in my tests!

rmyorston commented 1 month ago

@skeeto Thanks for checking that. What I've committed is slightly different to the patch above, but should be equivalent.

rmyorston commented 1 month ago

This issue should be resolved in the latest release, FRP-5398.

avih commented 1 month ago

Thanks for the release.

A bit off topic, but related to the FRP-5398 release:

avih commented 1 month ago

This issue should be resolved in the latest release, FRP-5398.

Wait. Are you sure?

FRP 5398 is indeed commit 89ae34445 as noted at the release file name, but that's two weeks old at this stage, and at the very least it doesn't include the additional fix of 480ebf4 .

rmyorston commented 1 month ago

I've updated the paragraph about the Unicode binary on the web page.

The raw numbers for 32/64/64u/a downloads of the current release so far this month are 993/1254/153/37, though these may be rather meaningless.

People on the internet do strange things. There's still a surprising number of downloads of FRP-4621 (70/50 for 32/64 bits). Every day there are about 30 attempts to download the long-since deleted FRP-3329. These are due to a broken GitHub Action.

If you think the issue with BB_ environment variables is sufficiently important you're welcome to reopen the issue.

avih commented 1 month ago

I've updated the paragraph about the Unicode binary on the web page.

Thanks. LGTM.

Related, from the download descriptions:

The 32-bit binary above will work on 64-bit systems but if you have a 64-bit version of Windows there's some advantage in using the 64-bit executable

Might be worth mentioning which advantages there are. I can't think of any. The only two things I can think of is maybe slight performance advantage (but I never tried to measure it), and maybe limiting access upto 2G/4G files, but I'd still think the 32 bit variant shouldn't have an issue here.

The raw numbers for 32/64/64u/a downloads of the current release so far this month are 993/1254/153/37, though these may be rather meaningless.

Thanks. Interesting.

I can't judge how meaningless these numbers are, but at least scoop downloads from your site, though it seems it currently only offers the 32/64/a variants, but not the 64u variant: https://github.com/ScoopInstaller/Main/blob/master/bucket/busybox.json

So despite the 64u variant not being offered via scoop, it's still 10%+ of the 64 variant downloads.

So 64u is not being ignored, and there haven't been unicode-related issues opened that I recall. Good. Thanks again for the numbers. Please do post if they change meaningfully and you noticed it.

(I did use scoop in the past, and liked it a lot, and possibly my first exposure to busybox-w32 was via scoop about 10 years ago, but not 100% sure)

FWIW, chocolatey also offers busybox-w32, but I don't think it's from your site, because the download numbers of 5398 are bigger than what you posted (but I can't find the source recipe with the download URL). Anyway, As far as I can tell it only offers the 32/64 packages - https://community.chocolatey.org/packages/busybox#files (the "files" section).

EDIT: actually it does download from your site, but indeed only the 32/64 versions (not 64u, not a): https://github.com/chtof/chocolatey-packages/blob/master/automatic/busybox/update.ps1

People on the internet do strange things

Yes, but I'm guessing these ones are due to some sites offering a direct link to a specific version and the authors stopped updating these sites/links. Or whatever. Who knows.

If you think the issue with BB_ environment variables is sufficiently important you're welcome to reopen the issue.

I think it's important, but as far as I'm concerned it's fixed (on master).

It's you who tries to maintain the invariant that closed issues are either WONTFIX or fixed and included in a release, so it's up to you to decide if you want it opened or closed.

rmyorston commented 1 month ago

The performance advantage of using the right sort of binary is more than slight. Running the testsuite with a 32-bit binary on 64-bit Windows 8.1 was 13% slower than a 64-bit binary. The disparity is even greater on Windows 10, the 32-bit binary was 60% slower.

There's also the confusion around System32 vs. SysWOW64 and Program Files vs. Program Files (x86), which can be avoided by using correct binary.

I didn't know Scoop now includes the ARM binary. That's nice.

My numbers and those from Chocolatey aren't comparable: mine are for this month, theirs since release on 25th June.

avih commented 1 month ago

The performance advantage of using the right sort of binary is more than slight. Running the testsuite with a 32-bit binary on 64-bit Windows 8.1 was 13% slower than a 64-bit binary. The disparity is even greater on Windows 10, the 32-bit binary was 60% slower.

Interesting. I did not expect that, and I don't think I noticed it (I do test the 32 bit build too, but not frequently).

In such case, maybe just mention that the 64 bit build can be faster in some cases, to make it less vague?

My numbers and those from Chocolatey aren't comparable: mine are for this month, theirs since release on 25th June.

Yes, but it's not a huge diff of the starting point. Judging by these numbers, it appears that most, or at least many of your downloads come from chocolatey...

avih commented 1 month ago

By the way, is there a story with unicode and the arm build? can it be enabled? do we know if the manifest work in any/some/all windows-on-arm cases, or whether it's required at all? (i.e. maybe the default codepage is utf8 on arm?)

rmyorston commented 1 month ago

Unicode can be enabled in the ARM64 build and it appears to work as expected on my Windows 11 device (in my very limited testing). The manifest is required: the default code page is not UTF-8.

According to the internet:

Windows on ARM64 was released quite early in the wild, with Windows 10 version 1709 (late 2017) on the Snapdragon 835 processor.

So it's possible there are systems out there that predate support for the UTF-8 manifest.

But probably not that many. So perhaps it would make sense for future releases of the ARM64 build to have Unicode support. If anyone is stuck on an old version of Windows 10 they can just as easily be stuck on an old version of busybox-w32.

ale5000-git commented 1 month ago

@rmyorston

Just a minor note, when a user that doesn't know see:

busybox64u.exe
busybox64a.exe

it may actually think Unicode / Ansi; so why not change a to arm to be more explicit?

avih commented 1 month ago

If anyone is stuck on an old version of Windows 10 they can just as easily be stuck on an old version of busybox-w32.

That, or they could build it themselves, which should be reasonably straight forward now, even on windows, probably the same as unicode with 32-bit busybox-w32, which is technically possibly but I don't imagine anyone actually requires it (are there x86 32-bit windows 10 systems?!)

Assuming the reason for considering switching to unicode build rather than adding one is that you prefer to publish and maintain as few variants as possible (which I'd agree with), then another option is to allow the unicode build to run anywhere as long as the manifest doesn't prevent execution (like XP), but unicode would only get enabled where the manifest has an effect - win10 1903 and later.

I don't think the current code supports this behavior - not because the hard exit on e.g. win7/8, but rather because some code assumes that if the manifest was enabled at build time then we have UTF-8 enabled (some do check GetACP(), but iirc some are hard coded to depend on the preprocessor).

However, I'm almost certain it would be trivial to make it work.

So, if we say that the 32 bit build remains ANSI (both for XP and as a non-unicode pre-built binary), and the 64 and arm builds become unicode, then the following users get the shorter end of the stick:

I'd think that's not too bad actually.

Another option would be to publish 32/64/arm unicode binaries, plus one 32 ANSI binary, so that by default everyone gets unicode if their system supports it, except that XP users and those who need non-unicode get the 32 ANSI build.

This (the unicode manifest but allow running also without unicode) would have another advantage, and that's letting the users themselves convert it back into "plain" non-unicode build - simply by deleting the manifest from the binary, for instance using perc or other PE editors.

So a XP64 user could get the 64 bit (unicode) build, strip the manifest and voila!

And if the manifest is still required (so that uname can report win10), then you can publish both the unicode and non-unicode manifest at the website, and just let users choose which one they prefer to set at their binary instead of the default.

This might be a bit far fetched, but some options above might be worth considering IMO, manly the unicode build which can still run on win7/8 (while ensuring the unicode mode is communicated clearly, e.g. "Unicode (auto): Disabled").

ale5000-git commented 1 month ago

are there x86 32-bit windows 10 systems?!

If you mean a PC that came with Win 10 32-bit "officially" preinstalled from the factory, probably not much.

But manually installed there are a lot of Windows 10 32-bit PCs because it works also on very old PCs, with less RAM use than 64-bit. A very old PC with a Samsung SSD and Windows 10 32-bit may still works very well, even better than Windows 8. And even though officially you can no longer upgrade for free to Win 10, in reality it is still possible.

avih commented 1 month ago

From the internet some years ago:

Beginning with Windows 10, version 2004, all new Windows 10 systems will be required to use 64-bit builds and Microsoft will no longer release 32-bit builds for OEM distribution. This does not impact 32-bit customer systems that are manufactured with earlier versions of Windows 10; Microsoft remains committed to providing feature and security updates on these devices, including continued 32-bit media availability in non-OEM channels to support various upgrade installation scenarios.

So I guess these users exist, but the vast vast vast majority of new win10/11/whatever users for some years now have 64 bit systems.

Anyway, there are 4 builds currently, and we can keep it 4 by doing 32/64/arm auto-unicode binaries plus one 32 ANSI binary which can also run on xp/32/64/arm systems, or one could build a binary to their exact specification if they prefer.

ale5000-git commented 1 month ago

Beginning with Windows 10, version 2004, all new Windows 10 systems will be required to use 64-bit builds and Microsoft will no longer release 32-bit builds for OEM distribution. This does not impact 32-bit customer systems that are manufactured with earlier versions of Windows 10; Microsoft remains committed to providing feature and security updates on these devices, including continued 32-bit media availability in non-OEM channels to support various upgrade installation scenarios.

OEM only refer to Windows shipped with the PC; but if the PC was shipped with Windows 7 then the user can download the ISO of Win 10 32-bit, burn on the DVD or write to usb, and then install it without too much effort. I also have successfully installed Windows 10 on a lot of PCs that was shipped with Windows XP (in this case I bought the license for 10 € on some sites).

Beside that aren't build automated? Is it a disk space problem?

avih commented 1 month ago

Is it a disk space problem?

I'd think that at most only to a small degree.

Other non-major reason could be build time.

But I think the main reason would be that that it confuses users. There are already 4 builds they can choose from, and if it becomes 6 (32/64/arm x ansi/unicode) then it's even more confusing.

Additionally, if we indeed agree that the unicode behavior is preferred where it's supported, then it's much easier to everyone if there's a single binary per architecture - with the same filename which previously was the "base" variant (non-unicode), which auto-enables unicode if the system supports it.

And for those who really don't want unicode, or can't run the auto-unicode build because they're on xp, there would be the "xp32" binary which would cover all of them (and if an xp64 user realy wants 64 bit binary, they can strip the manifest from an auto-unicode-64 binary, or just build it themselves).

I think it's much better than 6 binaries.

ale5000-git commented 1 month ago

My point was to have at least 32-bit ansi & 32-bit unicode, so with just 2 exe you can achieve support for all systems. 64-bit is just for performance, no other benefit. Certainly on my system I would put the 64-bit version but the point is portability, to put them on an USB drive I prefer 32-bit ansi & unicode.

ale5000-git commented 1 month ago

If the unicode version works on Windows 7 (even if limited), then these are enough I think (it is just my opinion):

32-bit ANSI
32-bit AUTO-UNICODE
64-bit AUTO-UNICODE
64-bit ARM UNICODE
avih commented 1 month ago

My point was to have at least 32-bit ansi & unicode

That's covered by my suggestion: 32/64/arm auto-unicode + "xp" 32 ANSI (which is the current 32 bit build).

If the unicode version works on Windows 7 (even if limited)

Currently it doesn't. Hence my suggestion for auto-unicode which will also run on win7/8 (albeit in ANSI mode).

ale5000-git commented 1 month ago

I didn't understand correctly before, this is mainly what I wanted.

PS: Regardless of what we choose I think there should be a way to reliably and programmatically detect whether we are working in ANSI or UNICODE (possible without grep it manually somewhere).

avih commented 1 month ago

I think there should be a way to reliably and programmatically detect whether we are working in ANSI or UNICODE

unicode build which can still run on win7/8 (while ensuring the unicode mode is communicated clearly, e.g. "Unicode (auto): Disabled"

ale5000-git commented 1 month ago

But the script shouldn't have the complication to detect wheter you are under the Win XP build or the other builds, just ANSI/UNICODE. The info should be in all builds (even if hardcoded).

PS: Maybe a custom applet would be less error prone compared to parse complicate text using grep.

avih commented 1 month ago

just ANSI/UNICODE.

The exact form of the output needs some thought, and it would certainly be useful to be simple and parsable, hence my suggestion was an example and not the final form.

The problem with "just ANSI/UNICODE" is that you can't tell whether it's hardcoded ansi/unicode or it automatically chose ansi/unicode depending on your system, and I think it's useful info to have.

Additionally, I don't think it should be "ANSI" on its own, because that's not clear that it indicates "not unicode".

Again, it needs some thought once we get there, but something like "Unicode: enabled (auto)" or "Unicode: disabled (auto)", depending on the system capabilities would be good enough and can be parsed easily by scripts.

But let's drop this subject for now. We can continue once we get to that bridge.

ale5000-git commented 1 month ago

Just a last idea about this: Auto will just complicate things. This is my suggestion:

Unicode build: yes/no
Unicode enabled: yes/no

Also everything on a separated line to be more easily parsed.

PS: For "enabled" we could also use a readonly set -o unicode

avih commented 1 month ago

readonly set -o unicode

That could be useful, but it can also create issues, such as

$ x=$(set +o)
$ eval "$x"
set: cannot set readonly option -- unicode

(not necessarily, it's possible that it already knows to allow "setting" a read-only option to the same value without errors)

It also depends on whether it warrants an option. I'm not sure, but it's not for me to decide.