tomerfiliba / plumbum

Plumbum: Shell Combinators
https://plumbum.readthedocs.io
MIT License
2.82k stars 184 forks source link

Globbing applies unexpected pattern parsing, noticeable with numbers and 'ranges' within brackets in pathnames #481

Open AndydeCleyre opened 5 years ago

AndydeCleyre commented 5 years ago

Arch Linux Python 3.7.4 Plumbum 1.6.7

mkdir '[0]'
cat <<EOF > '[0]'/test.py
from plumbum import local
print(local.cwd)
print(local.cwd.list())
print(local.cwd // '*')
EOF
cp -r '[0]' '[0-1]'
cp -r '[0]' '[1-0]'
cd '[0]'; python test.py
/home/andy/Code/plumbum/[0]
[<LocalPath /home/andy/Code/plumbum/[0]/test.py>]
[]
cd ../'[0-1]'; python test.py
/home/andy/Code/plumbum/[0-1]
[<LocalPath /home/andy/Code/plumbum/[0-1]/test.py>]
[]
cd ../'[1-0]'; python test.py
/home/andy/Code/plumbum/[1-0]
[<LocalPath /home/andy/Code/plumbum/[1-0]/test.py>]
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print(local.cwd // '*')
  File "/usr/lib/python3.7/site-packages/plumbum/path/base.py", line 44, in __floordiv__
    return self.glob(expr)
  File "/usr/lib/python3.7/site-packages/plumbum/path/local.py", line 170, in glob
    return self._glob(pattern, fn)
  File "/usr/lib/python3.7/site-packages/plumbum/path/base.py", line 390, in _glob
    return fn(pattern)
  File "/usr/lib/python3.7/site-packages/plumbum/path/local.py", line 169, in <lambda>
    fn = lambda pat: [LocalPath(m) for m in glob.glob(str(self / pat))]
  File "/usr/lib/python3.7/glob.py", line 20, in glob
    return list(iglob(pathname, recursive=recursive))
  File "/usr/lib/python3.7/glob.py", line 71, in _iglob
    for dirname in dirs:
  File "/usr/lib/python3.7/glob.py", line 72, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "/usr/lib/python3.7/glob.py", line 83, in _glob1
    return fnmatch.filter(names, pattern)
  File "/usr/lib/python3.7/fnmatch.py", line 52, in filter
    match = _compile_pattern(pat)
  File "/usr/lib/python3.7/fnmatch.py", line 46, in _compile_pattern
    return re.compile(res).match
  File "/usr/lib/python3.7/re.py", line 234, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.7/re.py", line 286, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.7/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.7/sre_parse.py", line 930, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.7/sre_parse.py", line 426, in _parse_sub
    not nested and not items))
  File "/usr/lib/python3.7/sre_parse.py", line 816, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib/python3.7/sre_parse.py", line 426, in _parse_sub
    not nested and not items))
  File "/usr/lib/python3.7/sre_parse.py", line 580, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range 1-0 at position 5

While only that last one raises an exception, none of those globs match properly.

senevoldsen commented 3 years ago

This occurs on Windows too. Folder D:\TXT_FILES[test] with files a.txt and b.txt running

folder = local.path(r"D:\TXT_FILES[test]")
folder // "*.txt"
# Results in: []

Doing the same but with the [test] part in the name it works.

wtanksleyjr commented 3 years ago

I just experienced this - essentially, the problem is that the documents imply that the // operator globs a wildcard on the right-hand-side, but actually it appears to glob the whole appended thing, which is destructive of the well-formed filenames required to form local.path, it can actually make a local.folder which presents is_dir() = True, with files that match the glob, appear to contain nothing (because of course once the square brackets are globbed they don't match themselves).

Here's a test run. What I'm doing here is setting up a case where the directory will actually switch to a neighboring folder when globbing is applied, giving a completely preposterous result.

$ mkdir 'this[test]'
$ touch this\[test\]/a
$ mkdir thise
$ touch thise/b
$ fg
[3]    785992 continued  python3.8
>>> p = local.path('this[test]/')
>>> p.is_dir()
True
>>> (p/'a').is_file()
True
>>> [s.is_file() for s in p//('a', 'b')]
[True]
>>> {str(s):s.is_file() for s in p//('a', 'b')}
{'/mnt/hdd-8TB/sharing/downloads/thise/b': True}
>>>
wtanksleyjr commented 3 years ago

I tried to make a workaround using a 'with cwd' context-manager, it exploded worse than expected. This might be my own mistake, or a different bug.

>>> with local.cwd(p) as path:
...  print('.' // ('a', 'b'))
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/william/.local/lib/python3.8/site-packages/plumbum/path/local.py", line 385, in __call__
    newdir = self.chdir(newdir)
  File "/home/william/.local/lib/python3.8/site-packages/plumbum/path/local.py", line 370, in chdir
    os.chdir(str(newdir))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/hdd-8TB/sharing/downloads/this\\[test\\]'
>>>
wtanksleyjr commented 3 years ago

OK, I found a kind of workaround, you have to use a different library to do the globbing, or just use regexp; but an expression like p // '*.mp3' can be replaced by [f for f in p if f.suffix == 'mp3']. It's a bit wordy, but not as bad as things might be. Obviously we could also use p.walk for more complex globs.

But this whole thing makes // a lot less generally useful.

henryiii commented 3 years ago

So the bug is that a // b causes the glob to apply to a? That might be fixable.

wtanksleyjr commented 3 years ago

Exactly, a // b applies the glob to the concatenation of a/b instead of scanning for glob matches to b strictly under the path a. And you can see the implementation problem, although in different forms, in the _glob functions in both remote and local paths -- both of them apply the globbing after concatenating the glob to the full pathname, without any escapes to protect the path.

Also, I found the documentation for this, and it is indeed a mismatch with the documentation when compared to the example I gave just above.

https://plumbum.readthedocs.io/en/latest/api/path.html#api-path "Returns a (possibly empty) list of paths that matched the glob-pattern under this path."

In my example, I construct a pair of paths designed to subvert that, making an unsuspecting use of globbing look outside of the specified path. This doesn't seem possible to defend against, aside from programmer paranoia or just never using globbing.

The .walk function and the __iter__ constructs are safe from this. A temporary fix would be to reimplement glob() or floordiv in terms of those. Odds are we can do better, but I don't know the real constraints.

-Wm

On Sat, Jul 10, 2021 at 8:53 AM Henry Schreiner @.***> wrote:

So the bug is that a // b causes the glob to apply to a? That might be fixable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tomerfiliba/plumbum/issues/481#issuecomment-877659583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7H6IRVYHPR2FYGLQEV2DTXBUJJANCNFSM4JGSVSTQ .

AndydeCleyre commented 2 years ago

I just started looking for globbing in the existing tests, and it's not directly related to this issue but it looks like this assertion will always succeed if the previous one does.