riggsd / guano-py

Python reference implementation of the GUANO bat acoustics metadata specification
http://guano-md.org
MIT License
13 stars 5 forks source link

Look Into Filesystem-based Metadata Cacheing #9

Closed riggsd closed 2 years ago

riggsd commented 7 years ago

Idea: SQLite3 database or fast key-value database like berkeleydb named .guano.py.cache with index of (filename, filesize, timestamp, hash).

Hash should be a fast non-cryptographic function like crc32, md5, sha1, xxHash.

If we determine that the file hasn't changed, load metadata from cache.

Would this be significantly faster given that we'd need to do full file reads to compute hash? Is (filename, filesize, timestamp) sufficient without a hash?