tablelandnetwork / weeknotes

A place for weekly updates from the @tablelelandnetwork team
0 stars 0 forks source link

[NOT-171] Weeknotes individual update: December 18, 2023 #148

Closed dtbuchholz closed 9 months ago

dtbuchholz commented 10 months ago

Fingerprinting database backups with homomorphic hashes

by Avichal Pandey

Previously, we talked about the set semantics of homomorphic hashes.

Database backups are a core part of the basin's data pipeline. We are exploring various techniques to fingerprint the backups to ensure data integrity and tamper resistance throughout the pipeline. In this post, I would like to demonstrate an application that uses set-based hashes to fingerprint database backups.

For this post, we will assume we are collecting the backup files, such as parquet exports, in a directory.

Screenshot 2023-12-15 at 17.51.23.png

We want to have a fingerprint for this directory. The fingerprint should change whenever we add a new backup file or delete or modify an existing file. If there are no changes in the directory, the fingerprint should stay the same. In this way, we can keep track of state changes.

To generate the fingerprint, we will use the following method.

  1. Read each file and generate a 32-byte hash digest with a performant one-way hash function like Blake3.
  2. Initialize the HashSet (homomorphic hash with set semantics) with a default value. This is the fingerprint of the directory when there are no files present.
  3. Insert the file hashes into the HashSet. Persist the Hashset for later use. You should update it when you are adding new files or modifying existing files.

In the future, if you want to verify that the backups are not corrupt, you could hash the files again, add those hashes into a Hash Set, and check if the fingerprint matches the one you have. Here, we gave an example of parquet files in a directory, but you could apply this technique to any collection of files. Here is an example in Rust. It iterates through all the files in a directory and calculates its fingerprint.

Screenshot 2023-12-15 at 18.01.19.png

From SyncLinear.com | NOT-171