qgustavor / douki

Douki is a audio fingerprinting based subtitle synchronization tool
ISC License
3 stars 0 forks source link

Roadmap #1

Open qgustavor opened 2 years ago

qgustavor commented 2 years ago
qgustavor commented 1 year ago

About finding a better fingerprint storage format, I did some tests:

Here's my proposal:

  1. Start from the current format of [...[tcode, hcode]]
  2. Run .flat(), so [tcode, hcode, tcode, hcode...]
  3. Encode as a Uint32Array
  4. Optionally prefix metadata encoded as MessagePack
  5. Prefix metadata size (which can be zero) encoded as Uint32
  6. Compress using LZMA
  7. Prefix with DOUKI

Decoding follow the following:

  1. Check if data starts with DOUKI, reject if not, then drop those bytes
  2. Decompress using LZMA
  3. Read metadata size as Uint32
  4. Read and decode metadata using MessagePack if it exists
  5. Read tcode and hcode values stored as Uint32

Why caring about adding metadata: because it allow versioning, it allows changing the fingerprinter parameters in case in future someone finds better values than the current ones, it allow adding info about the file used to generate the fingerprints.