Storage Proof Verification Profiling/ Use compact proof

tomusdrw commented 3 years ago

Currently the StorageProofChecker is simply just creating a (partially complete) trie of storage proof elements and the reads are preformed using read_trie_value function. The proofs often contains multiple storage entries - each message has it's own storage entry based on it's nonce.

We should check if it isn't better to perform proof verification and reads separately, i.e.:

We first verify the proof using sp_trie::verify_trie_proof function
We then query the values from a simple in-memory kvdb hashmap or get an iterator.

This will also ensure that all entries in the proof are actually correct (not only the ones we read).

EmmanuellNorbertTulbure commented 1 year ago

Dependency on https://github.com/paritytech/parity-bridges-common/issues/1950

serban300 commented 1 year ago

Trying this. I couldn't read the values from the proof yet. What I have so far:

use sp_core::{hash::H256, Blake2Hasher};
use sp_state_machine::{prove_read, InMemoryBackend};
use sp_trie::{
    verify_trie_proof, StorageProof, Trie, TrieDBBuilder, TrieDBMutBuilder, TrieMut,
};

type LayoutV1 = sp_trie::LayoutV1<Blake2Hasher>;
type MemoryDB = sp_trie::MemoryDB<Blake2Hasher>;
type RawProof = Vec<Vec<u8>>;

const NUM_KEYS: u32 = 100;
const KEY_1: &[u8; 4] = b"key1";
const VAL_1: &[u8; 6] = b"value1";

fn generate_trie_proof() -> (H256, RawProof) {
    let mut db = MemoryDB::default();
    let mut root = sp_trie::empty_trie_root::<LayoutV1>();

    {
        let mut trie = TrieDBMutBuilder::<LayoutV1>::new(&mut db, &mut root).build();
        for i in 0..NUM_KEYS {
            let key = format!("key{}", i).into_bytes();
            let val = format!("value{}", i).into_bytes();
            trie.insert(&key, &val).unwrap();
        }
        trie.commit();
    };

    let proof =
        sp_trie::generate_trie_proof::<LayoutV1, _, _, _>(&db, root, &[KEY_1])
            .unwrap();

    (root, proof)
}

fn main() {
    let (root, proof) = generate_trie_proof();

    let db: MemoryDB = StorageProof::new(proof.clone()).into_memory_db();
    let trie = TrieDBBuilder::<LayoutV1>::new(&db, &root).build();
    let mut keys = vec![];
    let val = trie.get(KEY_1);
    if let Ok(key_iter) = trie.key_iter() {
        keys = key_iter
            .filter_map(|maybe_key| {
                maybe_key
                    .ok()
                    .map(|key| (key.clone(), trie.get(&key).unwrap_or(None)))
            })
            .collect();
    }

    let res = verify_trie_proof::<LayoutV1, _, _, Vec<u8>>(
        &root,
        &proof,
        &[(KEY_1, Some(VAL_1.to_vec()))],
    );
    assert!(res.is_ok())
}

verify_trie_proof works, but only if we know the key and the value beforehand. trie.get(KEY_1); for example returns a InvalidStateRoot error.

serban300 commented 1 year ago

Yes, the proof generated by using sp_trie::generate_trie_proof doesn't contain the values. It only contains the keys. For example I generated a proof for a trie db with only one entry (key0=value0), and it looks like this: [72, 107, 101, 121, 48, 0]. [107, 101, 121, 48] = key0 in ascii. Not sure what 72 and 0 are for, but it definitely doesn't contain the value.

And sp_trie::verify_trie_proof can't be called on a proof generated using prove_read. It returns ExtraneousValue error.

So, if I understand correctly, with this approach we would need:

a hashdb containing all the key-value pairs
the proof generated using sp_trie::generate_trie_proof
getting all the key-value pairs from the hash db and using them to call sp_trie::verify_trie_proof

Still researching

svyatonik commented 1 year ago

@serban300 I think Tomek wanted to compare performance of current approach, where the storage proof includes all intermediate trie nodes and leaf values VS approach where the storage proof is just intermediate nodes and we deliver messages separately. So e.g. now the FromBridgedChainMessagesProof is the struct with 5 fields, including storage_proof and nonces_start..=nonces_end. And we are reading values from the proof. An alternative option could be the structure with one additional field: messages, which will have all encoded messages. Then we don't need leaf nodes in the storage_proof field. So the goal of this issue is to compare performance of two options and check if performance of the second option is larger than the performance of the first one.

serban300 commented 1 year ago

@svyatonik thank you ! I was about to ask you if you have more input here, since I wasn't sure I was on the right track.

Ok, I'll try to do some comparisons

serban300 commented 1 year ago

Compared the performance with the 2 approaches for 10k messages of 10KB each. Here are the results:

small/verify_trie_proof time:   [79.523 ms 79.564 ms 79.606 ms]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
small/verify_read_proof time:   [131.21 ms 131.28 ms 131.35 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

The approach suggested here is indeed a bit more efficient (79 ms vs 131 ms). @svyatonik wdyt ? Is the performance difference significant enough in order to try to implement this ?

Here is the code:

use core::time::Duration;
use criterion::{criterion_group, criterion_main, Criterion};
use rand::Rng;

use sp_core::{hash::H256, Blake2Hasher};
use sp_state_machine::{prove_read, InMemoryBackend};
use sp_trie::{read_trie_value, verify_trie_proof, StorageProof, TrieDBMutBuilder, TrieMut};

type LayoutV1 = sp_trie::LayoutV1<Blake2Hasher>;
type MemoryDB = sp_trie::MemoryDB<Blake2Hasher>;
type RawProof = Vec<Vec<u8>>;
type RawMsg = Vec<u8>;

const NUM_MSGS: u32 = 10000;
const MSG_LEN: usize = 10000;

fn generate_msg() -> Vec<u8> {
    let mut msg = vec![0; MSG_LEN];
    rand::thread_rng().fill(&mut msg[..]);
    msg
}

fn generate_msgs() -> (Vec<Vec<u8>>, Vec<RawMsg>) {
    let mut keys = vec![];
    let mut msgs = vec![];
    for i in 0..NUM_MSGS {
        let key = format!("key{}", i).into_bytes();
        keys.push(key);
        msgs.push(generate_msg());
    }
    (keys, msgs)
}

fn generate_proofs() -> (H256, RawProof, H256, RawProof, Vec<Vec<u8>>, Vec<RawMsg>) {
    let (keys, msgs) = generate_msgs();

    let mut read_values = vec![];
    for (key, msg) in keys.iter().zip(msgs.clone()) {
        read_values.push((None, vec![(key.clone(), Some(msg))]));
    }
    let backend =
        InMemoryBackend::<Blake2Hasher>::from((read_values, sp_runtime::StateVersion::default()));
    let read_root = backend.root().clone();
    let read_proof = prove_read(backend, &keys)
        .unwrap()
        .into_nodes()
        .into_iter()
        .collect();

    let mut trie_root = sp_trie::empty_trie_root::<LayoutV1>();
    let mut db = MemoryDB::default();
    {
        let mut trie = TrieDBMutBuilder::<LayoutV1>::new(&mut db, &mut trie_root).build();
        for (key, msg) in keys.iter().zip(msgs.clone()) {
            trie.insert(key, &msg).unwrap();
        }
        trie.commit();
    };
    let trie_proof =
        sp_trie::generate_trie_proof::<LayoutV1, _, _, _>(&db, trie_root, &keys).unwrap();

    (trie_root, trie_proof, read_root, read_proof, keys, msgs)
}

fn proof_benchmark(c: &mut Criterion) {
    let input = generate_proofs();

    let mut group = c.benchmark_group("small");

    group.bench_with_input(
        "verify_trie_proof",
        &input,
        |b, (root, proof, _, _, keys, msgs)| {
            b.iter(|| {
                let items: Vec<_> = keys
                    .iter()
                    .zip(msgs)
                    .map(|(key, msg)| (key, Some(msg.clone())))
                    .collect();
                let res = verify_trie_proof::<LayoutV1, _, _, Vec<u8>>(&root, &proof, &items);
                assert!(res.is_ok())
            })
        },
    );

    group.bench_with_input(
        "verify_read_proof",
        &input,
        |b, (_, _, root, proof, keys, _)| {
            b.iter(|| {
                let proof = StorageProof::new(proof.clone());
                let db = proof.into_memory_db();

                for key in keys {
                    read_trie_value::<LayoutV1, _>(&db, &root, key, None, None)
                        .expect("should work");
                }
            })
        },
    );

    group.finish();
}

criterion_group! {
  name = benches;
  config = Criterion::default().measurement_time(Duration::from_secs(100));
  targets = proof_benchmark
}
criterion_main!(benches);

svyatonik commented 1 year ago

Cool, thank you! Looks promising. But before considering migration, probably would be better to have tests using: 1) real world constants. E.g. maximal number of messages we are going to deliver within single transaction is limited to 4_096 (at production bridge hubs) and their total size is also limited (so 10k * 10kb won't work too). If my computations are correct max total size is something like 2.6Mb. Also most probably we'll have smaller transactions when bridge operates normally - e.g. if the whole roundtrip (delivery + confirmation) is 1m, senders must generate 4_096 messages in 1m. Don't know how realistic is that. So would be good to have some benchmarks for numbers that are closer to our regular transactions. Or maybe we may use our benchmarks engine and build model weight(number-of-messages, message-size) for both schemes? Or maybe let's limit size to size of our actual messages (sent by the pallet-bridge-assets-transfer)? BTW - using our benchmarks engine you also may measure performance of any code (like yours above), not only runtime calls; 2) this would be a breaking change for chains and for relayers. So it'll need an audit. Also could be a good test of how we can handle breaking changes when we'll be deploying on RBH<>WBH.

TLDR: let's do measurements on something that looks similar to our actual transactions and see if it worth to implement this now or do it later. If the gain is still large enough, then let's do it.

bkontur commented 1 year ago

Could this also affect our storage stuff? I am not sure, do we store proofs anywhere? I mean do we need any storage migration for this? Or this is just about reading?

svyatonik commented 1 year ago

Could this also affect our storage stuff? I am not sure, do we store proofs anywhere? I mean do we need any storage migration for this? Or this is just about reading?

No, we don't store proofs. But they're parts of call arguments. So we'll need to sync upgrades and relay restarts - a good test for our upgrade plan :)

serban300 commented 1 year ago

real world constants. E.g. maximal number of messages we are going to deliver within single transaction is limited to 4_096 (at production bridge hubs) and their total size is also limited (so 10k * 10kb won't work too). If my computations are correct max total size is something like 2.6Mb. Also most probably we'll have smaller transactions when bridge operates normally - e.g. if the whole roundtrip (delivery + confirmation) is 1m, senders must generate 4_096 messages in 1m. Don't know how realistic is that. So would be good to have some benchmarks for numbers that are closer to our regular transactions. Or maybe we may use our benchmarks engine and build model weight(number-of-messages, message-size) for both schemes?

Makes sense. Looking on how we can use our benchmarks engine and build this model for both schemes.

Or maybe let's limit size to size of our actual messages (sent by the pallet-bridge-assets-transfer)?

I don't know if we have a limit per message in pallet-bridge-assets-transfer. I know there is a limit on the number of assets. Anyway, the way we craft the message I suppose it should be really small. Probably a couple of hundreds of bytes. It shouldn't be over 2KB. So I ran the tests again with 4096 messages of 2KB each and the results are the following:

small/verify_trie_proof time:   [7.4423 ms 7.4448 ms 7.4479 ms]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
small/verify_read_proof time:   [15.385 ms 15.406 ms 15.439 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

svyatonik commented 1 year ago

Cool, still looks good. No need to do our benchmarks then - imo let's use this approach

bkontur commented 1 year ago

Do we want to go with this new stuff to BHK/P or later we will do upgrade to this?

serban300 commented 1 year ago

Do we want to go with this new stuff to BHK/P or later we will do upgrade to this?

Personally I wouldn't block on this. We could include it in BHK/P or not, depending on when it's ready.

bkontur commented 1 year ago

Do we want to go with this new stuff to BHK/P or later we will do upgrade to this?

Personally I wouldn't block on this. We could include it in BHK/P or not, depending on when it's ready.

yes, no blocking, if it is ready lets go with it, if not we upgrade

serban300 commented 1 year ago

Sorry, coming back with a disappointing update. I changed the verify_trie_proof test in order to simulate better the final implementation. Made it use a hash db and perform another read for each key after verifying the proof. Before it was using a vec with keys and a vec with messages, which made the job easier.

Here is the new code:

    group.bench_with_input(
        "verify_trie_proof",
        &input,
        |b, (root, proof, _, _, db, keys, msgs)| {
            b.iter(|| {
                let trie = TrieDBBuilder::<LayoutV1>::new(db, &root).build();
                let items: Vec<_> = keys
                    .iter()
                    .map(|key| (key, trie.get(key).unwrap()))
                    .collect();
                let res = verify_trie_proof::<LayoutV1, _, _, Vec<u8>>(&root, &proof, &items);
                assert!(res.is_ok());

                for key in keys {
                    assert!(trie.get(key).is_ok());
                }
            })
        },
    );

And the results:

small/verify_trie_proof time:   [18.495 ms 18.504 ms 18.514 ms]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
small/verify_read_proof time:   [16.386 ms 16.395 ms 16.404 ms]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Now this approach leads to worse results. So @svyatonik wdyt ? Should we keep the old approach ?

svyatonik commented 1 year ago

But we don't necessary need to read messages (and other leafs) using the trie, right? The verify_trie_proof guarantees that the proof is correct wrt to given leafs, right? Then we can just drop the trie and trust our Vec<_>. Or am I missing something and verify_trie_proof gives no such guarantee?

serban300 commented 1 year ago

We can read the messages from the vec after verification, true, but it wouldn't be generic. It would work only for the messages. Other fields have to be read by key.

And the performance difference in this case would be:

small/verify_trie_proof time:   [13.639 ms 13.648 ms 13.657 ms]
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
small/verify_read_proof time:   [16.230 ms 16.258 ms 16.292 ms]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

svyatonik commented 1 year ago

We don't need it to be super generic - we only deliver: (1) proof of messages (2) proof of outbound lane state (3) proof of inbound lane state. Latter two may be represented as Option<...>. So imo it is fine.

I think this code:

                let items: Vec<_> = keys
                    .iter()
                    .zip(msgs)
                    .map(|(key, msg)| (key, Some(msg.clone())))
                    .collect();
                let res = verify_trie_proof::<LayoutV1, _, _, Vec<u8>>(&root, &proof, &items);
                assert!(res.is_ok())

is close to what we have actually. What we currently do is reading values from trie and collecting them to Vec<_>. If we'll be using new scheme, we already have the vec, so we only need to do the verification. And this code gives these results which looks ok to me, right?

Please excuse me if I'm missing something - I haven't looked at code details yet.

svyatonik commented 1 year ago

(ahh and also we deliver proof of parachain heads, bu they're also Vec<u8>, so nothing generic is required)

serban300 commented 1 year ago

Ok, I thought it had to be more generic. I'll try it like this then.

paritytech / parity-bridges-common

Storage Proof Verification Profiling/ Use compact proof #665