rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
697 stars 126 forks source link

[Bug] `cd` can change into non-existing directories ending with dot(s), space(s) #478

Open neodyne opened 4 days ago

neodyne commented 4 days ago

version: BusyBox v1.37.0-FRP-5467-g9376eebd8 busybox64u.exe sh -l

~ $ ls ...
ls: can't open '...': No such file or directory
~ $ cd ...
~/... $ echo $PWD
C:/Users/abc/...
~ $ ls Downloads......
ls: can't open 'Downloads......': No such file or directory
~ $ cd Downloads......
~/Downloads...... $ echo $PWD
C:/Users/abc/Downloads......
~ $ ls C:/Windows....
ls: can't open 'C:/Windows....': No such file or directory
~ $ cd C:/Windows....
C:/Windows.... $ echo $PWD
C:/Windows....

(some?) applets behave like being run from the directory without the trailing dots:

~/... $ touch test.txt  # creates ~/test.txt

further changing into subdirs, existing or not, fails:

~/... $ cd ...
ls: can't open '...': No such file or directory
~/... $ cd Downloads # here ~/Downloads actually exists ( %USERPROFILE%\Downloads )
sh: cd: can't cd to Downloads: No such file or directory

EDIT: same applies to paths with trailing spaces like cd '~/dir '

avih commented 4 days ago

EDIT: same applies to paths with trailing spaces like cd '~/dir '

Can you clarify what "same" means?

Same as all the above but e.g. (quoted) '~/abc ' instead of (unquoted) ... ?

Or same only as the "further changing into subdirs..." part?

Or something else?

neodyne commented 4 days ago
~ $ cd 'Downloads    '
~/Downloads     $ echo "x${PWD}x"
xC:/Users/abc/Downloads    x

same as all above, including applet behavior and subdirs.


doesn't happen with other characters afaict ([a-zA-Z0-9] + tested a few other ascii chars). For completion's sake, happens also with a trailing string of both spaces and dots:

~ $ cd 'Downloads ....  ...   ..'
~/Downloads ....  ...   .. $ echo $PWD
C:/Users/abc/Downloads ....  ...   ..
neodyne commented 4 days ago

Searching further, this seems like intended Win32 API behavior, to "strip trailing spaces and dots from the final component". However, consider in windows command prompt:

Microsoft Windows [Version 10.0.19045.5131]

C:\Users>cd "abc ... ... .. "
C:\Users\abc>

C:\Users\abc>echo %CD%
C:\Users\abc

C:\Users\abc>cd Downloads
C:\Users\abc\Downloads>

So even that cd in busybox-w32 sh succeeds may be expected behavior (implemented in winapi after all), $PWD should probably end up with a path with trimmed spaces/dots, so that changing into subdirs works.

ale5000-git commented 4 days ago

But if a script expect to have 2 separate folders "abc" and "abc.", then it will mess up things. I think it is safer to make commands (cd / mkdir / mv for folder) fails with these chars at the end.

rmyorston commented 4 days ago

Yes, it's been broken forever. It also affects files, not just directories. So:

~ $ touch empty
~ $ ls -l empty.
-rw-rw-r--    1 rmy      rmy              0 Nov 18 16:52 empty.
~ $ ls -l empty..
-rw-rw-r--    1 rmy      rmy              0 Nov 18 16:52 empty..
~ $ ls -l empty...
-rw-rw-r--    1 rmy      rmy              0 Nov 18 16:52 empty...

The Windows API really does ignore all trailing periods and spaces.

But if a script expect to have 2 separate folders "abc" and "abc.", then it will mess up things.

The script has no right to expect any such thing on Windows. You can't have separate folders called abc and abc. on Windows.

As they say:

Do not end a file or directory name with a space or a period. Although the underlying file system may support such names, the Windows shell and user interface does not.

neodyne commented 4 days ago

@rmyorston could code responsible for $PWD strip trailing dots/spaces, so that operations requiring the current working directory would work after the first cd, the same way it works in windows command prompt?


to explain the difference:

in cmd.exe:

C:\Users\abc>cd "Downloads ... ... .. ."
C:\Users\abc\Downloads>cd testdir
C:\Users\abc\Downloads\testdir>

in busybox sh:

~$ cd "Downloads ... ... .. ."
 # here `PWD=~/Downloads` would be nice, so the next `cd` doesn't fail
~/Downloads ... ... .. . $ cd testdir
sh: cd: can't cd to testdir: No such file or directory
rmyorston commented 4 days ago

@neodyne Sure, I'll give it some thought.

ale5000-git commented 4 days ago

The script has no right to expect any such thing on Windows. You can't have separate folders called abc and abc. on Windows.

As they say:

Do not end a file or directory name with a space or a period. Although the underlying file system may support such names, the Windows shell and user interface does not.

The script might not be specifically for Windows but a general universal POSIX script; so it may override a file from a folder not intended if the author don't know, so fails is safer (in my opinion).

avih commented 4 days ago

I also think it should fail if trying to use a file or path name which windows would strip of trailing junk.

But trying to reject some inputs without first checking whether windows is OK with them can be a slippery slope of bad things.

This is a complex subject.

In some cases bb can detect that after the fact, like after a successful "cd", it can read the CWD from windows itself, and updates PWD accordingly (and display it correctly at the prompt etc).

This may have precedent, for instance cd x and cd x/ both result in the same PWD without the trailing /, but I don't know at which stage the slash is removed.

But in other cases I don't think bb can know, like in cat 'foo ', so it probably have to reject it in advance, and if it doesn't reject it in advance then it has no idea which actual file name was written read (but also no idea if it wants to).

neodyne commented 4 days ago

@avih @ale5000-git not that I disagree, I'd like to see to see it fail on nonexistent paths as well, I was just trying to be realistic, considering busybox-w32 is built on top of win32api.

seems that UNC extended paths ( \\?\C:\ ) don't do path normalization[*] (so dot/space trimming as well), can only use "\" as path separator and don't resolve "." and ".." components; this may be useful but creates additional challenges. As was mentioned above, it would still not allow coexisting "abc" and "abc." directories/files.

for reference, see e.g. https://github.com/python/cpython/issues/84419 , apparently os.stat() in python succeeds in a situation similar to this as well.

[*] An extended path only skips normalization in an open or create context.

rmyorston commented 3 days ago

Since the shell already makes changes to the path in the cd builtin it isn't too much of a stretch to have it strip the trailing dots and spaces too. With this change cd should behave more like cmd.exe.

There are new prerelease binaries with this change (PRE-5532 or above).

neodyne commented 3 days ago

@rmyorston small quirk, in some cases the trimmed path now ends with a slash:

~ $ cd .........
~/ $ echo $PWD
C:/Users/abc/    <-- trailing slash
ale5000-git commented 3 days ago

@avih @ale5000-git not that I disagree, I'd like to see to see it fail on nonexistent paths as well, I was just trying to be realistic, considering busybox-w32 is built on top of win32api.

I'm not sure if busybox just blindly call the Win API but in cases it check if the file/dir path is valid before calling the Win API it could simply return invalid path if the last char is space or dot, so it will works globally with a simple change.

neodyne commented 3 days ago

Also, in busybox sh {pre-,}release, single trailing dot in any path component succeeds...

~ $ cd 'Downloads/testdir     /testdir2' # fails, ok
~ $ cd 'Downloads/testdir /testdir2' # fails, ok
~ $ cd Downloads/testdir....../testdir2  # fails, ok
~ $ cd Downloads/testdir./testdir2 # succeeds, PWD not overwritten; this should probably fail too
~/Downloads/testdir./testdir2 $

(All of the above succeeds in cmd.exe, with an overwritten CWD, but hey, that's Windows for ya.)

neodyne commented 3 days ago

@ale5000-git my understanding is that it checks if the file/dir path exists using winapi, so it returns that the path is valid. Some winapi functions can take extended UNC paths, which skip path normalization, but not all.

ale5000-git commented 3 days ago

If I'm not wrong the function call stat of busybox, so can't stat be changed to say that these pathes do not exist?

For example: stat ...

rmyorston commented 3 days ago

in some cases the trimmed path now ends with a slash

Fair point. This further patch should handle that case:

diff --git a/win32/mingw.c b/win32/mingw.c
index 6842dba48..aab09b8e9 100644
--- a/win32/mingw.c
+++ b/win32/mingw.c
@@ -2189,6 +2189,9 @@ void FAST_FUNC strip_dot_space(char *p)
    while (end > start && (end[-1] == '.' || end[-1] == ' ')) {
        *--end = '\0';
    }
+
+   if (--end != start && (*end == '/' || *end == '\\'))
+       *end = '\0';
 }

 size_t FAST_FUNC remove_cr(char *p, size_t len)

I'm inclined not to bother with the trailing dot on intermediate directories. That would require more code. Plus, it sorts itself out when the intermediate directory becomes the last:

~ $ cd Downloads/testdir./testdir2
~/Downloads/testdir./testdir2 $ cd ..
~/Downloads/testdir $

so can't stat be changed to say that these pathes do not exist?

Sure, but there are calls to scores, if not hundreds, of C runtime and WIN32 functions. It's impractical to intercept them all. Intercepting only a few is sure to lead to inconsistencies.

ale5000-git commented 3 days ago

I thought they first call a function inside busybox codebase and only after the C runtime / WIN32 functions so I'm wrong.

ale5000-git commented 3 days ago

There is also the inconsistency that this works: stat ... but this no: ls ...