rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
685 stars 126 forks source link

PATH may include CWD - which is bad #422

Closed avih closed 2 months ago

avih commented 3 months ago

At least on my system, PATH ends in semicolon, which rightly translates in busybox sh to "include CWD when searching $PATH".

While cmd.exe does search CWD too - even if PATH doesn't end in semicolon (and also doesn't begin with semicolon or includes two consecutive semicolons), busybox sh is not supposed to search CWD by default because POSIX and security of cd into a dir of extracted malicious zip, and then running a command which the user expects to be global, like "notepad" - but may actually be part of the malicious content at CWD.

The final semicolon in PATH is not due to user error as far as I can tell, and also it can't be fixed as far as I can tell (windows env-wise at "about" -> "advanced settings"), because apparently the "User variables for $user" has

PATH=%USERPROFILE%\AppData\Local\Microsoft\WindowsApps;

But when trying to edit it - it doesn't show the final semicolon as part of the value, so when it appends the user PATH to the system PATH, it ends in semicolon.

In my own ~/.profile I do remove empty elements from PATH, so I don't have this issue, but many/all users probably do have this issue.

I can think of one or two things to address it:

  1. Remove empty elements from PATH on init.
  2. Disallow CWD search unless some shell option is set (which is off by default).

Thoughts?

cochon-git commented 3 months ago

On Windows 10 for me (and presumably other versions) this seems to be a side effect of the way Windows joins the user and system PATH variables from HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment and HKEY_CURRENT_USER\Environment, the system one is stored with a trailing semicolon the user variant without (both via the GUI editor), so you see this issue only if you don't have a user level PATH, which is the norm.

Maybe right trimming the PATH on initial conversion would make sense as there seems no logical way to explicitly set the POSIX interpretation within Windows, and those who need it can add it within .profile as they would have to anyway on a *nix system.

rmyorston commented 3 months ago

That Path user variable is new in Windows 10. It isn't present in my Windows 8.1 VMs so there's no empty element in my PATH and therefore no problem. Why on earth did Microsoft add the trailing slash, though?

The dialog to edit variables is weird. You need to press the 'Edit text...' button to get the actual text to edit, not the other random 'Edit' buttons.

My inclination would be to do the minimum necessary to solve the immediate problem: remove a trailing separator if it occurs after that specific path.

Removing all empty elements might zap one the user (or some random installed software) actually wanted.

Adding a shell option seems far too intrusive a change.

ale5000-git commented 3 months ago

I think that just removing the trailing ; inside the mixed_case_special_name function is enough.

To include cwd one can just add: ;.

avih commented 3 months ago

so you see this issue only if you don't have a user level PATH, which is the norm.

Not sure if "the norm" refers to "see the issue" or "don't see the issue", but for me, in the "Environment Variables" dialog, the PATH in "System Variables" does NOT end in semicolon, while the PATH in "User variables for ..." does end in semicolon (and the PATH in a new cmd.exe window does end in semicolon).

Also, the user PATH there (%USERPROFILE%\AppData\Local\Microsoft\WindowsApps;) is not something I added manually, so I presume it's a windows thing which happens either by default or after some $whatever user action which adds it.

However, I really can't estimate how many systems/users are affected by this. I'd think most/all, judging from how it looks on my system, but it's still a guess.

That Path user variable is new in Windows 10. It isn't present in my Windows 8.1

I believe you, but I find it very weird. I don't have a windows 8.1 to test, but it does exist in XP (SP3), and even in win95 (press ALT-ENTER after running "command" from the start menu).

EDIT: Oh, I guess you mean the global Path does exist, but only the user one doesn't? right, that sounds plausible.

My inclination would be to do the minimum necessary to solve the immediate problem: remove a trailing separator if it occurs after that specific path.

That would work for my test case, but I don't know where elsewhere it might happen, e.g. while the system joins the system path and the user path.

Removing all empty elements might zap one the user (or some random installed software) actually wanted.

If some random app adds CWD to the path then I think it's fine to remove it, as it's a potential security issue IMO.

I also doubt the user would do that intentionally, because cmd.exe already searches CWD even eithout it. Same, as far as I can tell, when an application tries to launch another app - it does also search CWD.

So there's no good reason to add CWD to PATH unless it's explicitly intended for apps which use PATH strictly - like busybox sh.

In such case (which is arguably viable for some), your suggested approach may already override it - if the user added a semicolon at the end to achieve that.

So for consistensy, IMO all the empty elements should be removed, and if user wants to restore that behavior, they'd need to prefix or suffix semicolon to PATH in their ~/.profile.

EDIT: if any of these measures taken, then it's probably worthwhile to add some note at the README or elsewhere, because this would be different than how cmd.exe searches an executable.

avih commented 3 months ago

I also doubt the user would do that intentionally, because cmd.exe already searches CWD even eithout it. Same, as far as I can tell, when an application tries to launch another app - it does also search CWD.

On the other hand, it seems that powershell does use PATH strictly, but does NOT search CWD if PATH ends in semicolon. If one wants "foo.exe" to get executed in powershell, seemingly it has to be at an explicit element in PATH.

(i.e. there's also no reason to add semicolon to path for powershell, because it's ignored anyway)

Another though about the implementation, maybe completely ignore empty elements in path, and to add CWD to the search, one would need to add ;. to path - i.e. an explicit dot dir? (assuming the PATH search is implemented in busybox-w32, and not left to the system)

This is what powershell seems to do as well: empty elements in PATH are ignored (tested only trailing), dot-dir in PATH is respected and adds CWD to the search.

EDIT: Possibly supporting this suggestion of respectiung CWD in PATH only if it's an explicit dot dir also exists in POSIX:

A zero-length prefix is a legacy feature that indicates the current working directory. It appears as two adjacent <colon> characters ( "::" ), as an initial <colon> preceding the rest of the list, or as a trailing <colon> following the rest of the list.

And then:

A strictly conforming application shall use an actual pathname (such as .) to represent the current working directory in PATH.

ale5000-git commented 3 months ago

The trailing ; exist by default in Windows but there shouldn't be empty elements by default unless the user change something.

cochon-git commented 3 months ago

Not sure if "the norm" refers to "see the issue" or "don't see the issue", but for me, in the "Environment Variables" dialog, the PATH in "System Variables" does NOT end in semicolon, while the PATH in "User variables for ..." does end in semicolon (and the PATH in a new cmd.exe window does end in semicolon).

I rather regret using the term 'norm', as there doesn't appear to be one :smiley:

Tinkering on another system I get semicolons trailing on both user AND system PATHs and thus a trailing semicolon in the combined PATH. I suppose inconsistency is not surprising given there's the GUI, as well as setx and reg directly, any permutation of which could be used by an installer script.

If some random app adds CWD to the path then I think it's fine to remove it, as it's a potential security issue IMO.

Contriving to do this by adding extra semicolons directly in regedit, gets undone by the GUI editor anyway, so I'd have to agree, I don't think it's something that might be deliberately set that needs to be retained by busybox.

avih commented 3 months ago

Another though about the implementation, maybe completely ignore empty elements in path, and to add CWD to the search, one would need to add ;. to path - i.e. an explicit dot dir? (assuming the PATH search is implemented in busybox-w32, and not left to the system)

Right, I think it is implemented in busybox, at find_executable at libbb/executable.c.

The code currently does the legacy behavior of applying an empty element as if it was a dot-dir.

I think busybox-w32 can skip this legacy behavior on Windows in a fairly tiny patch (because there isn't a history of this behavior on windows IMO), and end up with the same behavior as powershell where only an explicit dot-dir is interpreted as CWD, while empty elements in PATH are ignored.

I think it's simpler, more effective, and more consistent than editing PATH on init (removing all/last empty elements).

Sample patch (untested):

diff --git a/libbb/executable.c b/libbb/executable.c
index 606bec986..652cfda15 100644
--- a/libbb/executable.c
+++ b/libbb/executable.c
@@ -51,13 +51,14 @@ char* FAST_FUNC find_executable(const char *filename, char **PATHp)

        n = strchr(p, PATH_SEP);
        if (n) *n = '\0';
-       p = concat_path_file(
-           p[0] ? p : ".", /* handle "::" case */
-           filename
-       );
 #if ENABLE_PLATFORM_MINGW32
-       {
-           char *w = file_is_win32_exe(p);
+       if (!p[0]) {  /* ignore empty elements (for CWD use ".") */
+           ex = 0;
+           p = 0;
+       } else {
+           char *w;
+           p = concat_path_file(p, filename);
+           w = file_is_win32_exe(p);
            ex = w != NULL;
            if (ex) {
                free(p);
@@ -65,6 +66,10 @@ char* FAST_FUNC find_executable(const char *filename, char **PATHp)
            }
        }
 #else
+       p = concat_path_file(
+           p[0] ? p : ".", /* handle "::" case */
+           filename
+       );
        ex = file_is_executable(p);
 #endif
        if (n) *n++ = PATH_SEP;
avih commented 3 months ago

Sample patch (untested):

Hmmm... this works (ignores empty components in $PATH) in which foo.exe, but not when actually trying to run foo.exe.

So at the very least it's incomplete.

rmyorston commented 3 months ago

Changing find_executable() is insufficient. The shell has its own find_command(). Note the comment above that: if you change find_command() you might also need to change shellexec().

I'm not keen on this approach. It's intrusive and it alters the long-standing UNIX convention that empty elements of PATH refer to the current directory.

Moreover, it seems Windows actually shares this convention: it's just somewhat hidden by APIs like CreateProcess() and SearchPath() going out of their way to check for files in the current directory independently of what might be in PATH.

Looking at how SearchPath() is implemented in Wine, it calls RtlDosSearchPath_U(). This function interprets a leading semicolon and intermediate pairs of semicolons in the path in just the same way as Unix. However, it ignores a trailing semicolon.

This is the sort of behaviour Microsoft seem to be relying on when they randomly append a trailing semicolon to PATH.

On the basis of this my inclination is now to remove a single trailing semicolon when PATH is imported into the shell. (Not double trailing semicolons, though.)

avih commented 3 months ago

it alters the long-standing UNIX convention that empty elements of PATH refer to the current directory.

On its own IMO this doesn't carry a lot of weight, because Windows doesn't have such tradition of intentionally using empty components to represent CWD. The fact that POSIX discourages it in favor of explicit dot-dir further reduces its weight on Windows IMO.

The intussiveness argument should carry weight, however, and I don't know how intrussive it would become if implemented fully.

Moreover, it seems Windows actually shares this convention: it's just somewhat hidden by APIs like CreateProcess() and SearchPath() going out of their way to check for files in the current directory independently of what might be in PATH

IMO the fact that these APIs ignore empty components actually supports the notion that busybox-w32 should ignore them too, because if present, then IMO they're much more likely to be accidental than intentional, and they're not intended to add CWD search.

The fact that these APIs do try CWD very hard might indeed make it harder to implement (depending on existing infrastructure etc), but this goes into the intrusiveness argument IMO.

Looking at how SearchPath() is implemented in Wine, it calls RtlDosSearchPath_U(). This function interprets a leading semicolon and intermediate pairs of semicolons in the path in just the same way as Unix. However, it ignores a trailing semicolon.

Interesting. Is there reasoning for this behavior? is it related to how Windows handles such Paths?

Also, this aims to [not-]emulate Windows, while here we try to behave in a more POSIX-y way.

If we do agree that empty components are much more likely to be accidental than intentional, and that unintentionally searching CWD is a security issue or at the very least undesirable, then I don't think there can be many conclusions other than ignoring them in PATH.

On the basis of this my inclination is now to remove a single trailing semicolon when PATH is imported into the shell. (Not double trailing semicolons, though.)

This basically aims to behave the same as ignoring them, just in an incomplete way and with possibly more side effects. E.g. if PATH ends in two semicolons then busybox-w32 would remove the last one (still searching CWD because there were two), but next time a busybox applet (or shell?) is invoked, it would remove the second semicolon and now it won't search in CWD.

Therefore the most consistent and correct way to ignore them is to ignore them, rather than taking some action which would mostly make them ignored under practical conditions.

So I still think that ideally ignoring them is the best behavior. How close to that ideal we can get with reasonable effort and complexity is a different question, for which I don't have an answer yet, and a solution which mostly works (we hope) would still be better than the current behavior.

But I think we should examine the ideal approach a bit more before implementing the "remove last semicolon" approach.

avih commented 3 months ago

Sample patch (untested):

Hmmm... this works (ignores empty components in $PATH) in which foo.exe, but not when actually trying to run foo.exe.

This part adds support in ash find_command and shellexec, as well as tab-completion:

diff --git a/libbb/lineedit.c b/libbb/lineedit.c
index a8a9ee154..7ea8541b7 100644
--- a/libbb/lineedit.c
+++ b/libbb/lineedit.c
@@ -1057,6 +1057,10 @@ static NOINLINE unsigned complete_cmd_dir_file(const char *command, int type)
            paths[i] = (char *)".";
        }

+# if ENABLE_PLATFORM_MINGW32
+       if (!*paths[i])
+           continue;  /* skip empty path components */
+# endif
        lpath = *paths[i] ? paths[i] : ".";
        dir = opendir(lpath);
        if (!dir)
diff --git a/shell/ash.c b/shell/ash.c
index 6ff7eb59d..c0bdc5255 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -453,6 +453,31 @@ static void forkshell_print(FILE *fp0, struct forkshell *fs, const char **notes)
 # endif
 #endif

+#if ENABLE_PLATFORM_MINGW32
+/* dst should be big enough to hold a copy of src (including \0) */
+static char *remove_empty_win_path_components(char *dst, const char *src)
+{
+   char *dst0 = dst;
+   int c;
+
+   while ((c = *src++)) {
+       if (c != ';' || (dst != dst0 && dst[-1] != ';'))
+           *dst++ = c;  /* skip leading/consecutive semicolon */
+   }
+
+   if (dst != dst0 && dst[-1] == ';')
+       --dst;  /* trailing semicolon */
+
+   *dst = 0;
+   return dst0;
+}
+
+#define TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS(var) \
+   char w32path_strict_[4096]; \
+   if (strlen(var) < 4096) \
+       var = remove_empty_win_path_components(w32path_strict_, var);
+#endif
+
 /* ============ Hash table sizes. Configurable. */

 #define VTABSIZE 39
@@ -9195,6 +9220,9 @@ static void shellexec(char *prog, char **argv, const char *path, int idx)
    char **envp;
    int exerrno;
    int applet_no = -1; /* used only by FEATURE_SH_STANDALONE */
+#if ENABLE_PLATFORM_MINGW32
+   TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS(path);
+#endif

    envp = listvars(VEXPORT, VUNSET, /*strlist:*/ NULL, /*end:*/ NULL);
 #if ENABLE_FEATURE_SH_STANDALONE && ENABLE_PLATFORM_MINGW32 && defined(_UCRT)
@@ -15114,6 +15142,9 @@ find_command(char *name, struct cmdentry *entry, int act, const char *path)
    int updatetbl;
    struct builtincmd *bcmd;
    int len;
+#if ENABLE_PLATFORM_MINGW32
+   TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS(path);
+#endif

 #if !ENABLE_PLATFORM_MINGW32
    /* If name contains a slash, don't use PATH or hash table */

While definitely not as small as editing PATH on init would be, I think it's still fairly unintrussive in the grand scheme of things, and hopefully more complete and consistent.

I don't feel strongly about the naming or the exact form of integrating the conversion at find_command and shellexec, and if TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS is to be used in other files, then it should move to mingw.c/h.

While it might be possible to use TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS also at find_exec (instead of the previous patch), its interface is more involved, and it returns a pointer to the next element in PATH, so we can't create a modified copy for this use case, at least not trivially, so the previous patch is needed as well.

This seems to work well after cursory testing of which, running an executable by name, and tab completion of names in CWD, with PATH containing empty components (correctly fails to find in CWD), and with dot-dir in PATH (does find in CWD).

EDIT: Possibly a completely different approach would be to modify getenv so that if the argument is "PATH", then it would return a pointer to a local static copy with empty components stripped? That would be seriously unintrussive IMO, but I didn't test whether that works.

avih commented 3 months ago

Possibly a completely different approach would be to modify getenv so that if the argument is "PATH", then it would return a pointer to a local static copy with empty components stripped? That would be seriously unintrussive IMO, but I didn't test whether that works.

Well, that didn't work either. Maybe it also needs to hook {put,set}env, but I'm not exploring it further.

Maybe indeed most straight forward solution is to just edit PATH on entry, to guarantee a correct starting point, and then allow users to shoot themselves in the foot by adding empty components.

However, I do still strongly think that all empty components should be removed, rather than only a single trailing semicolon, because we still think, IMO, that all of them are equally highly likely to be accidental, and not only the last one (and dot-dir component is still a thing regardless).

The difference between removing only one trailing semicolon and removing all empty components is the same as far as infrastructure goes, and both can be done in-place if needed (assuming a writable copy).

So maybe that's what we should do...

rmyorston commented 3 months ago

OK, here's my take:

diff --git a/libbb/executable.c b/libbb/executable.c
index 606bec986..99b77fef9 100644
--- a/libbb/executable.c
+++ b/libbb/executable.c
@@ -51,6 +51,9 @@ char* FAST_FUNC find_executable(const char *filename, char **PATHp)

        n = strchr(p, PATH_SEP);
        if (n) *n = '\0';
+#if ENABLE_PLATFORM_MINGW32
+       else return NULL;   /* skip single trailing path separator */
+#endif
        p = concat_path_file(
            p[0] ? p : ".", /* handle "::" case */
            filename
diff --git a/shell/ash.c b/shell/ash.c
index d2c88c4bc..b38e6bce5 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -15983,6 +15983,13 @@ init(void)
                /* make all variable names uppercase */
                for (start = *envp;start < end;start++)
                    *start = toupper(*start);
+
+               /* remove single trailing path separator */
+               if (is_prefixed_with(*envp, "PATH=")) {
+                   end = last_char_is(*envp, ';');
+                   if (end && end[-1] != ';')
+                       *end = '\0';
+               }
            }

            /* Initialise some variables normally set at login, but
avih commented 3 months ago

I think this suggestion can result in inconsisent behavior: If the user adds a trailing semicolon to PATH in ash after init, then ash will search CWD (execution, tab completion, etc) while all other applets (like which) would not search CWD.

Also, I stil don't think I heard why you prefer to remove only a single trailing semicolon and not all of the empty components.

My current estimation is because you consider it good enough for common cases, and it's relatively non-intrusive to implement in both ash (at init) and all other applets (at find_executable). Is that correct?

But do you also agree that this is not the ideal behavior, and that the ideal behavior is to ignore all empty components, because on Windows any empty component is highly likely accidental, and not intended to represent CWD? (supported by the fact that cmd.exe, powershell, CreateProcess, and SearchPath all ignore empty path components).

If we agree that the ideal is to always ignore empty components then I think we can do better then the suggested patch while still remaining reasonably unintrussive.

For instance, maybe we can put something similar to this at main() at libbb/appletlib.c, so that it applies to all applets, which hoepfully includes ash (untested):

    /* remove empty components from path */
    for (envp = environ; envp && *envp; envp++) {
        if (is_prefixed_with_case(*envp, "PATH=")) {
            char *p0 = *envp + 5, *s = p0, *d = p0;
            for (; *s; ++s) {
                if (*s != ';' || (d != p0 && d[-1] != ';'))
                    *d++ = *s;
            }
            /* no empty components except maybe one trailing */
            if (d != p0 && d[-1] == ';')
                --d;
            *d = 0;
            break;
        }
    }

If that works, shouldn't this be similarly non intrussive, while also getting much closer to the ideal compared to your suggested patch?

(this is out of my comfort zone, and I don't really know how nofork applets work, but if it's a nofork applet, and if it's not supposed to alter the env, then we can save the original PATH value before running it, and restore it afterwards, using normal get/put env).

ale5000-git commented 3 months ago

@avih A user that use the shell can't be too much newby. If we do have to care like this then all the shell has to be rewritten to be not prone to user error.

avih commented 3 months ago

I didn't say it was a user error. The user adding semicolon can be intentional. Regardless if intentional or not, it would result in inconsistency. But that was an anecdote.

The main point is that before we choose a comrpmise, we should first agree on the ideal behavior, because I don't think we've done that yet.

My take is that any empty component in path should be considered acciodental and not intended to be interpreted as CWD, because everything on Windows ignores empty PATH components, so busybox should not treat them as intentional. That means all of them. Not only a final empty component, which doesn't follow a nother empty component.

How close we can get to that ideal is a different matter, but we should first agree on the ideal behavior.

ale5000-git commented 3 months ago

In my opinion it can't be accidental, the casual Windows user do not edit the PATH, the one that does it they at least know something about programming.

I don't think the user is likely to end up with an empty component in the path without knowing.

avih commented 3 months ago

Anyway, I think most things have been said, and it's fine to disagree.

I'm going to apply this hopefully ideal patch in my build.

The intrussive part is (the find_executable diff a bit smaller than my previous suggestion):

diff --git a/libbb/executable.c b/libbb/executable.c
index 606bec986..3d0b21915 100644
--- a/libbb/executable.c
+++ b/libbb/executable.c
@@ -50,6 +50,12 @@ char* FAST_FUNC find_executable(const char *filename, char **PATHp)
        int ex;

        n = strchr(p, PATH_SEP);
+#if ENABLE_PLATFORM_MINGW32
+       if (n ? p == n : !*p) {
+           p = n ? n + 1 : n;
+           continue;  /* skip empty path component */
+       }
+# endif
        if (n) *n = '\0';
        p = concat_path_file(
            p[0] ? p : ".", /* handle "::" case */
diff --git a/libbb/lineedit.c b/libbb/lineedit.c
index a8a9ee154..7ea8541b7 100644
--- a/libbb/lineedit.c
+++ b/libbb/lineedit.c
@@ -1057,6 +1057,10 @@ static NOINLINE unsigned complete_cmd_dir_file(const char *command, int type)
            paths[i] = (char *)".";
        }

+# if ENABLE_PLATFORM_MINGW32
+       if (!*paths[i])
+           continue;  /* skip empty path components */
+# endif
        lpath = *paths[i] ? paths[i] : ".";
        dir = opendir(lpath);
        if (!dir)
diff --git a/shell/ash.c b/shell/ash.c
index 6ff7eb59d..a243297c6 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -9195,6 +9195,9 @@ static void shellexec(char *prog, char **argv, const char *path, int idx)
    char **envp;
    int exerrno;
    int applet_no = -1; /* used only by FEATURE_SH_STANDALONE */
+#if ENABLE_PLATFORM_MINGW32
+   TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS(path);
+#endif

    envp = listvars(VEXPORT, VUNSET, /*strlist:*/ NULL, /*end:*/ NULL);
 #if ENABLE_FEATURE_SH_STANDALONE && ENABLE_PLATFORM_MINGW32 && defined(_UCRT)
@@ -15114,6 +15117,9 @@ find_command(char *name, struct cmdentry *entry, int act, const char *path)
    int updatetbl;
    struct builtincmd *bcmd;
    int len;
+#if ENABLE_PLATFORM_MINGW32
+   TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS(path);
+#endif

 #if !ENABLE_PLATFORM_MINGW32
    /* If name contains a slash, don't use PATH or hash table */

and the non-intrussive part is implementing TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS at ash.c, like so:

#if ENABLE_PLATFORM_MINGW32
/* dst should be big enough to hold a copy of src (including \0) */
static char *remove_empty_win_path_components(char *dst, const char *src)
{
    char *dst0 = dst;
    int c;

    while ((c = *src++)) {
        if (c != ';' || (dst != dst0 && dst[-1] != ';'))
            *dst++ = c;  /* skip leading/consecutive semicolon */
    }

    if (dst != dst0 && dst[-1] == ';')
        --dst;  /* trailing semicolon */

    *dst = 0;
    return dst0;
}

#define TRY_REMOVE_EMPTY_WIN_PATH_COMPONENTS(var) \
    char w32path_strict_[4096]; \
    if (strlen(var) < 4096) \
        var = remove_empty_win_path_components(w32path_strict_, var);
#endif

Assuming this indeed results in the no-compromizes ideal behavior of always ignoring empty components, I think it's worth it for me to maintain it in my build if it's not taken upstream.

avih commented 3 months ago

Also, I believe this part in your suggested patch skips the last path component regardless if it's empty or not (though I didn't test it):

diff --git a/libbb/executable.c b/libbb/executable.c
index 606bec986..99b77fef9 100644
--- a/libbb/executable.c
+++ b/libbb/executable.c
@@ -51,6 +51,9 @@ char* FAST_FUNC find_executable(const char *filename, char **PATHp)

        n = strchr(p, PATH_SEP);
        if (n) *n = '\0';
+#if ENABLE_PLATFORM_MINGW32
+       else return NULL;   /* skip single trailing path separator */
+#endif
        p = concat_path_file(
            p[0] ? p : ".", /* handle "::" case */
            filename

I think it should be:

else if (!*p) return NULL;   /* skip single trailing path separator */
rmyorston commented 3 months ago

I think this suggestion can result in inconsisent behavior

It does. I've worked around this by picking up your idea of altering PATH in the applet main() rather than just the shell. Since this results in a rather larger patch I've put it on the path_search branch.

But do you also agree that this is not the ideal behavior, and that the ideal behavior is to ignore all empty components

No, I don't.

cmd.exe, powershell, CreateProcess, and SearchPath all ignore empty path components

PowerShell certainly goes out of its way to avoid the usual meaning of empty path components. The others conceal the underlying behaviour by searching the current directory independently of what's in PATH.

SearchPath() does use the POSIX interpretation of empty leading semicolons and pairs of intermediate semicolons, as suggested by the Wine function mentioned above:

#include <stdio.h>
#include <windows.h>

int main(int argc, char **argv)
{
    char buffer[1024];

    SetSearchPathMode(BASE_SEARCH_PATH_ENABLE_SAFE_SEARCHMODE);
    if (SearchPath(argv[1], "file", ".txt", 1024, buffer, NULL))
        printf("%s\n", buffer);
    else
        printf("not found\n");
    return 0;
}
~ $ mkdir dir1 dir2
~ $ touch dir2/file.txt
~ $ ./tester "dir1;;dir2"
C:\Users\rmy\dir2\file.txt
~ $ touch file.txt
~ $ ./tester "dir1;;dir2"
C:\Users\rmy\file.txt
~ $ ./tester ";dir1;dir2"
C:\Users\rmy\file.txt
~ $

I think it should be:

else if (!p) return NULL; / skip single trailing path separator */

Indeed it should. Thanks for spotting that. (Though the change in find_executable() is no longer present in my latest patch.)

avih commented 3 months ago

Let's see:

cmd.exe: ignores all empty components (searches CWD before PATH).

powershell: ignores all empty components (doesn't search CWD unless there's a dot-dir component in PATH).

CreateProcess: ignores all empty components (if lpApplicationName is not NULL then PATH is ignored, if it is null then CWD is searched before anything else).

SearchPath by default ignores all empty components because it searches CWD before anything else.

I presume you're correct that SearchPath interprets non-trailing empty components as CWD when mode is BASE_SEARCH_PATH_ENABLE_SAFE_SEARCHMODE.

However, any number of trailing empty components are still ignored, because it still searches CWD at the end regardless of any number of trailing empty components (including zero).

Additionally, Cursory github search for BASE_SEARCH_PATH_ENABLE_SAFE_SEARCHMODE yields 936 results, which is an extremely small number. I think it's fair to say that either SearchPath is very rarely used, or it's used but except rare cases, it's unsafe mode - which ignores all empty components.

So basically, of the utils and APIs we mentioned so far, only SearchPath interprets empty component as CWD, and only in safe mode - which is seemingly extremely rare. Regardless, it still ignores all trailing empty components.

Considering the above, I'd say that any empty component in PATH on Windows is accidental and not intended to represent CWD, because the only case where something cares (SearchPath in safe mode) is extremely rare, and surely no installer or user PATH setup had this in mind when they (accidentally) added an empty component.

Together with the POSIX statement that any conforming application (user) should not use empty component to represent CWD in PATH, and instead use dot-dir, I think it's much safer to ignore all empty components than to consider them intentional.

I.e. the expectancy of surprising behavior and bad results is much higher IMO when considering empty components as CWD, than when ignoring all empty components.

It's fine if you disagree, but even if you do, you should still also ignore all trailing empty components (and not only a single one), because according to the analysis above they're always ignored by the utils/APIs we mentioned.

altering PATH in the applet main() rather than just the shell

Right. I presume it works. FWIW, when I tried something similar after I suggested it, it didn't work for me (I deleted all empty components, but tab completion still worked for CWD). Not sure why and I already deleted the code, I guess it was something silly, but it's worth testing at least tab completion in cases where this new code leaves PATH without any empty components.

Is my concern about nofork applets modifying the env valid?

avih commented 3 months ago

Right. I presume it works.

I tested it briefly, and it does seem to work, including tab completion.

Personally I'd apply this additional patch in my build to remove all empty components:

diff --git a/libbb/appletlib.c b/libbb/appletlib.c
index 90acf05f6..8bbe9c077 100644
--- a/libbb/appletlib.c
+++ b/libbb/appletlib.c
@@ -1435,9 +1435,10 @@ int main(int argc UNUSED_PARAM, char **argv)
        /* remove single trailing separator from PATH */
        for (char **envp = environ; envp && *envp; envp++) {
            if (is_prefixed_with_case(*envp, "PATH=")) {
-               char *end = last_char_is(*envp, ';');
-               if (end && end[-1] != ';')
-                   *end = '\0';
+               char *p0 = *envp + 5, *s = p0, *d = p0;
+               do if (*s != ';' || (d != p0 && s[1] && s[1] != ';'))
+                   *d++ = *s;
+               while (*s++);
                break;
            }
        }

It's sub-optimal, because that would modify PATH also for windows apps which are executed from bb, but if I assume that any empty ones are accidental, and that scripts also don't add accidental empty ones, then it's fine, and it's less intrussive than my original take.

rmyorston commented 3 months ago

It's sub-optimal, because that would modify PATH also for windows apps

Quite so. That was a consideration.

Is my concern about nofork applets modifying the env valid?

Upstream thought of that: nofork applets get their own copy of the environment.

avih commented 3 months ago

Quite so. That was a consideration.

But your code does that as well... (if there's a single trailing semicolon)

Upstream thought of that: nofork applets get their own copy of the environment.

Nice. Also glad I thought of that ;)

avih commented 3 months ago

Off topic I think, but an interesting observation: if PATH is empty, then it's considered an empty path component, which is interprered as CWD search (also on bash).

Not sure if that's a correct intrepretation though, because POSIX wording mentions explicitly the existance of a semicolon to interpret an empty component:

A zero-length prefix is a legacy feature that indicates the current working directory. It appears as two adjacent characters ( "::" ), as an initial preceding the rest of the list, or as a trailing following the rest of the list.

Disabling PATH search can have valid use cases, but doing PATH= does not achieve that goal in practice...

Actually, this appears later at the end of the paragraph:

If PATH is unset or is set to null, the path search is implementation-defined.

So I guess CWD is allowed, but I think a good implementation should not search anywhere in such case.

ale5000-git commented 3 months ago

You are right, this may be a bug.

On bash online this:

which more && echo ok || echo err
cd /usr/bin
PATH='.' which more && echo ok || echo err
PATH=':' which more && echo ok || echo err
PATH='' which more && echo ok || echo err

Result in this:

/usr/bin/more
ok
./more
ok
./more
ok
err

So the completely empty PATH effectively disable cwd search.

avih commented 3 months ago

this may be a bug.

It's not a bug. Read my comment again.

PATH='' which more && echo ok || echo err

This tests how which handles nul PATH, which seems to not search anywhere, but bash does search CWD:

cd /usr/bin
PATH=
echo 123 | more

works, because it does find more (and echo is a builtin so no issue that it works).

Also, I don't know if that's actually bash, because $SHELL is /bin/sh, which is a symlink to dash.

ale5000-git commented 3 months ago

It seems a bit awkward that which behave differently from the shell but it seems like you said.

I'm not sure about the website but maybe the $SHELL env is wrong because ps -p $$ show that it is bash.

avih commented 3 months ago

Looking at how SearchPath() is implemented in Wine, it calls RtlDosSearchPath_U(). This function interprets a leading semicolon and intermediate pairs of semicolons in the path in just the same way as Unix. However, it ignores a trailing semicolon.

Are you sure? As far as I can tell it doesn't ignore any empty components at all. The sole indication for any empty component (including trailing) is that needed == 0, and this case is not special except that it doesn't use backslash to combine the (empty) path prefix with the filename.

(note that the *ptr != 0 condition does not prevent testing this path prefix. it only stops the count of needed - which is the length of the prefix we're about to test now).

Regardless, I don't think we should take whatever it does as more than an anecdote, because this is not Windows, and we don't know in which contexts this function is used.

EDIT: actually you're right, because it never enters the outer loop when *path is 0, but to be honest, I don't think it's intentional, or else I'd imagine it would have a comment explaining it. Which means it doesn't do any search at all when PATH is nul... (which is OK).

avih commented 3 months ago

Just a heads up: my original take of working with a modified copy of path in shellexec and find_command might have unintended consequences, as it might change the result of pdatetbl = (path == pathval()); in find_command, and I don't understand its consequences, though it does seems to work.

Despite its shortcomings, I think the path edit in appletlib main is overall better as it's much more localized and should be easier to maintain.

It's sub-optimal, because that would modify PATH also for windows apps

Quite so. That was a consideration.

FWIW, I don't think it needs to be addressed (e.g. restored before executing apps) because other than being a potential hornets nest[*], at least with the current patch, it should make no difference anyway, because at least according to the analysis above we didn't find anything where any number of trailing semicolons make a difference (except in busybox itself) - not even safe SearchPath, because it searches CWD at the end anyway.

Also, as I noted, my ~/.profile does remove empty components in PATH for a long ttime now, and I don't think I've seen issues that can be attributed to that.

[*] E.g. the user wanted to remove all empty components in PATH, and their script found none and concluded it's fine, but when they run an app suddenly an empty component is added.

rmyorston commented 3 months ago

Thanks for your feedback. I've applied the patch from the path_search branch, unchanged apart from a reference to this issue in the commit message.

Prereleases are available (PRE-5372 or above).

rmyorston commented 2 months ago

This issue has been fixed in the latest release, FRP-5398.