mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
89 stars 14 forks source link

Speed up file scanning #116

Closed yzqzss closed 1 year ago

yzqzss commented 1 year ago

Use set instead of list to speed up the scanning of large numbers of files (>10000) in images/.


Benchmark: (one million files in images/ dir)

Set: one million files/s
List: 40 files/s

453