jaraco / zipp

MIT License
61 stars 53 forks source link

`glob('*.txt')` is much slower than it should be on zip files with many recursive files #101

Closed jaraco closed 1 year ago

jaraco commented 1 year ago

glob('*.txt') is much slower than it should be on zip files with many recursive files.

This is because it uses ._descendants() (which lists all recursive descendants) when there's no need, as this glob should only need to read the top-level .iterdir():

https://github.com/jaraco/zipp/blob/ee6d7117023cd90be7647f894f43aa4fef16c13d/zipp/__init__.py#L373

For large zip files, this can easily make a 1000x speed difference, which is surprising when you're writing code that aims to work on both pathlib.Path and zipfile.Path.

Originally posted by @nh2 in https://github.com/jaraco/zipp/issues/98#issuecomment-1627798869