paritytech / trie

Base-16 Modified Patricia Merkle Tree (aka Trie)
Apache License 2.0
259 stars 68 forks source link

Root hash mismatch when compared to other libraries. #182

Open azteca1998 opened 1 year ago

azteca1998 commented 1 year ago

Hello,

We're developing a patricia merkle tree and would like to compare our implementation with yours (both for testing and benchmarking). However we've found that the computed hash from your library doesn't match the ones computed from other libraries (including ours).

Hashing mismatch example. ```rs //! # Basic example. //! //! Target hash extracted from here: //! https://github.com/ethereum/tests/blob/develop/TrieTests/trietest.json#L97-L104 //! //! ## Dependencies //! //! ```toml //! [dependencies] //! cita_trie = "4.0.0" //! hasher = "0.1.4" //! hex-literal = "0.3.4" //! memory-db = "0.31.0" //! reference-trie = "0.26.0" //! sha3 = "0.10.6" //! trie-db = "0.24.0" //! //! [dependencies.patricia-merkle-tree] //! git = "https://github.com/lambdaclass/merkle_patricia_tree" //! ``` use cita_trie::{MemoryDB, PatriciaTrie, Trie}; use hasher::HasherKeccak; use hex_literal::hex; use patricia_merkle_tree::PatriciaMerkleTree; use sha3::Keccak256; use std::sync::Arc; fn main() { const DATA: &[(&[u8], &[u8])] = &[(b"abc", b"123"), (b"abcd", b"abcd"), (b"abc", b"abc")]; const HASH: [u8; 32] = hex!("7a320748f780ad9ad5b0837302075ce0eeba6c26e3d8562c67ccc0f1b273298a"); println!("Expected : {HASH:02x?}"); println!("Our hash: {:02x?}", run_ours(DATA.iter().copied())); println!("Cita hash: {:02x?}", run_cita(DATA.iter().copied())); println!("Parity hash: {:02x?}", run_parity(DATA.iter().copied())); } fn run_ours<'a>(data: impl Iterator) -> [u8; 32] { let mut trie = PatriciaMerkleTree::<_, _, Keccak256>::new(); data.for_each(|(p, v)| { trie.insert(p, v); }); trie.compute_hash().as_slice().try_into().unwrap() } fn run_cita<'a>(data: impl Iterator) -> [u8; 32] { let mem_db = Arc::new(MemoryDB::new(true)); let hasher = Arc::new(HasherKeccak::new()); let mut trie = PatriciaTrie::new(mem_db, hasher); data.for_each(|(p, v)| { trie.insert(p.to_vec(), v.to_vec()).unwrap(); }); trie.root().unwrap().try_into().unwrap() } fn run_parity<'a>(data: impl Iterator) -> [u8; 32] { use memory_db::{HashKey, MemoryDB}; use reference_trie::ExtensionLayout; use trie_db::{NodeCodec, TrieDBMutBuilder, TrieHash, TrieLayout, TrieMut}; let mut mem_db = MemoryDB::<_, HashKey<_>, _>::new(::Codec::empty_node()); let mut root = >::default(); let mut trie = TrieDBMutBuilder::::new(&mut mem_db, &mut root).build(); data.for_each(|(p, v)| { trie.insert(p, v).unwrap(); }); trie.commit(); *trie.root() } ```

After some investigation we've found that although the hashing algorithm is the same (given the same inputs will produce the same outputs), the inputs to the hashing function are different. As per the ethereum wiki on patricia merkle tries we've used RLP encoding on our implementation, and also found references to it in yours.

When trying to decode (using RLP) the data passed to your hash function we've found that it can't be RLP (or at least not the RLP on the ethereum wiki) since it didn't make sense. For example, attempting to decode an RLP list with a few exabytes of data.

Do you use a specific (non-ethereum RLP) encoding that fits your needs? Can it be changed to use ethereum's RLP?

Thanks.

cheme commented 1 year ago

Do you use a specific (non-ethereum RLP) encoding that fits your needs? Can it be changed to use ethereum's RLP?

yes codec is define by the https://github.com/paritytech/trie/blob/master/trie-db/src/node_codec.rs NodeCodec use by the trie layout.

The ethereum one (the rlp codec) is not in this crate but in the old parity-ethereum repository, and later in the open-ethereum repository, and aligned to an old version of the crate.