Idea: SQLite3 database or fast key-value database like berkeleydb named .guano.py.cache with index of (filename, filesize, timestamp, hash).
Hash should be a fast non-cryptographic function like crc32, md5, sha1, xxHash.
If we determine that the file hasn't changed, load metadata from cache.
Would this be significantly faster given that we'd need to do full file reads to compute hash? Is (filename, filesize, timestamp) sufficient without a hash?
Idea: SQLite3 database or fast key-value database like berkeleydb named
.guano.py.cache
with index of(filename, filesize, timestamp, hash)
.Hash should be a fast non-cryptographic function like crc32, md5, sha1, xxHash.
If we determine that the file hasn't changed, load metadata from cache.
Would this be significantly faster given that we'd need to do full file reads to compute hash? Is
(filename, filesize, timestamp)
sufficient without a hash?