glob('*.txt') is much slower than it should be on zip files with many recursive files.
This is because it uses ._descendants() (which lists all recursive descendants) when there's no need, as this glob should only need to read the top-level .iterdir():
For large zip files, this can easily make a 1000x speed difference, which is surprising when you're writing code that aims to work on both pathlib.Path and zipfile.Path.
glob('*.txt')
is much slower than it should be on zip files with many recursive files.This is because it uses
._descendants()
(which lists all recursive descendants) when there's no need, as this glob should only need to read the top-level.iterdir()
:https://github.com/jaraco/zipp/blob/ee6d7117023cd90be7647f894f43aa4fef16c13d/zipp/__init__.py#L373
For large zip files, this can easily make a 1000x speed difference, which is surprising when you're writing code that aims to work on both
pathlib.Path
andzipfile.Path
.Originally posted by @nh2 in https://github.com/jaraco/zipp/issues/98#issuecomment-1627798869