oxen-io / oxen-storage-server

Storage server for Oxen Service Nodes
MIT License
30 stars 50 forks source link

Storage test to use hash instead of whole data #122

Open sachaaaaa opened 5 years ago

msgmaxim commented 5 years ago

Hash is not enough to prove a node holds the whole message. Has to be a range into the data (index to a single byte?).

sachaaaaa commented 5 years ago

Yes, as discussed previously. The title was just to remind us.

ghost commented 1 year ago

Hash is not enough to prove a node holds the whole message. Has to be a range into the data (index to a single byte?).

A salted hash is enough to prove a node holds the whole message.

  1. the verifier generates a random nonce, sends the nonce to challenge the testee
  2. the testee appends the nonce in the end of the whole message, calculates the hash and sends it back to the verifier
  3. the verifier compares the hash received from the testee, calculates the same salted hash locally, determines whether the testee knows exactly the same data as the verifier.
ghost commented 1 year ago

We might want to survey / reuse / modify distributed database like Apache Cassandra / Scylla for how they maintain data integrity across different nodes.

Amazon DynamoDB and Apache Cassandra use it during the data replication process. These No-SQL distributed databases use Merkle trees to control discrepancies.

https://www.simplilearn.com/tutorials/blockchain-tutorial/merkle-tree-in-blockchain

https://distributeddatastore.blogspot.com/2013/07/cassandra-using-merkle-trees-to-detect.html

See also https://dbdb.io/ https://15721.courses.cs.cmu.edu/spring2023/project2.html

The subtle difference is, most distributed database assume replicate storage nodes are honest, (they might fail but they don't cheat), while in Oxen network we need to assume replicate storage nodes might cheat.

To prevent cheating, we could consider modifing "Merkle Tree" into a "Salted Merkle Tree", where the verifier specifies a token range and generates a random nonce to challenge the testee, the testee needs to append the nonce in all messages (or data partitions) in the specific token range, calculates the merkle tree and returns it to the verifier.

There are some tricky potential race conditions hard to handle though.