jaraco / zipp

MIT License
61 stars 53 forks source link

`glob('**')` returns all files and directories #102

Open jaraco opened 1 year ago

jaraco commented 1 year ago

A further bug is that glob('**') in zipp returns all files and directoreis, while pathlib.Path.glob('**') returns only directories.

Originally posted by @nh2 in https://github.com/jaraco/zipp/issues/98#issuecomment-1627805807

jaraco commented 1 year ago

That's interesting, because glob.glob('**') returns files and directories, and the docs for pathlib.glob are unclear on the meaning of **. Probably zipp.Path should align with pathlib conventions, but only where they're by design and not themselves an undefined

nh2 commented 1 year ago

the docs for pathlib.glob are unclear on the meaning of **

You are right, and it gets even more confusing:

I think there should be a Python upstream issue to document all of that, including the differences between glob and pathlib.

Then zipp could implement what's documented for pathlib.

Repro

Python 3.10.11

mkdir -p dir/subdir ; touch toplevel.txt dir/indir.txt

creates:

toplevel.txt
dir/
  indir.txt
  subdir/
>>> import glob, pathlib

>>> glob.glob('**')
['dir', 'toplevel.txt']

>>> glob.glob('**', recursive=True)
['dir', 'dir/subdir', 'dir/indir.txt', 'toplevel.txt']

>>> list(pathlib.Path('.').glob('**'))
[PosixPath('.'), PosixPath('dir'), PosixPath('dir/subdir')]
jaraco commented 7 months ago

@nh2, since a lot of work has been done upstream to clarify the behavior, intent, and exceptions, would you be willing to review the state of the art for CPython (pathlib) and make a recommendation for what behaviors would be best for zipp.Path/zipfile.Path?

nh2 commented 7 months ago

I would like to, but I expect that this month I'll be completely out of time.

jaraco commented 7 months ago

Sounds good. I'll assign it to you but also flag it up for others to consider. Take your time and come back to it when you're up for it. If someone else wants to help, please comment here first.