Closed jpraynaud closed 2 months ago
The larger merkle root correspond to block ranges that contains only one transaction. In that case the root of the merkle tree correspond to the value of its unique leaf (aka a MKTreeNode). Since the conversion of a CardanoTransaction to a MKTreeNode use simply the hash of the transaction that means that the resulting merkle root is equal the hash of it's unique transaction.
When there's more than one leaf in a tree a merge is applied to compute the root. This merge is done using Blake hash, returning a fixed size string, size that happen to be smaller than the one of a transaction hash, explaining what we see.
As of today on mainnet we have ~680k
block ranges root computed from ~90M
transactions.
$ select count(*) from block_range_root;
679909
$ select count(*) from cardano_tx;
90461766
On those block ranges root only ~13k
are affected, 1.95%
of the total.
$ select count(*) from block_range_root where length(merkle_root) > 64;
13292
The highest recorded start block number is 10 304 925
while the latest occurrence start block number is 491 077
. This means that all occurrences are on the start on the chain, so it's extremely unlikely that it will occurs again in future blocks.
$ select max(start) from block_range_root;
10304925
$ select max(start) from block_range_root where length(merkle_root) > 64;
491077
When converting a CardanoTransaction
to a MKTreeNode
we could apply to the transaction hash the same Blake hash as used when merging node.
But this would means doing a hash computation at least twice (one for each transaction, one for the merge) only to get consistent merkle root length in db.
Side note: we choose to use vector of byte for the MKTreeNode
. This is what allow variable length and this case to happen. If we later find that having a fixed size is something that we want to enforce we should modify the MKTreeNode
to use a fixed size array instead.
As fixing this would result in at least doubling the number of hash computation for little gain and the occurrences are confined to the start of the chain, we choose to not do anything and leaving the code as is.
Why
We have noticed that some Merkle root created are longer for some entries when they should all be of the same size.
This happens on the e2e test:
What
Investigate and fix the source of the longer Merkle roots
How
Create a fix to compute Merkle roots of fixed sizeCreate a migration to recompute all the Block range root