solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.17k stars 4.29k forks source link

Shreds could be constructed more efficiently for verify #16403

Open sakridge opened 3 years ago

sakridge commented 3 years ago

Problem

Entry struct is serialized as a data blob and then split into shreds on SHRED_DATA_SIZE ~1200-byte boundaries. This means a validator would need all shreds containing an Entry i.e. a full FEC set before it can deserialize into Entry and thus be able to start verifying transaction signatures and/or PoH and be able to start notifying that a given transaction is in a block.

Proposed Solution

Partition [Entry] into shreds such that each Shred is individually deserializable. This will allow the validator to start verify faster and also fire the included commitment level faster.

utsl42 commented 3 years ago

I have an idea, based on the "slotted page" concept used in databases.

Something like this:

  1. Instead of serializing a whole Vec<Transaction>, serialize each separately, appending to buffer, and collecting a vector of offsets as we go.
  2. We add to each shred the subset of offsets that start within that shred.
  3. We append an Entry, minus the Vec<Transaction> at the end.

We can then use the offset slices from each shred to find the start of each serialized object without having to deshred the entire buffer first. We should therefore be able to randomly access, deserialize, and verify signatures for most (the ones that don't span into a shred we don't have yet) received Transaction's as soon as we have the shred(s) they're packed in, because we can use the offsets to locate them. At deshred time, we can recreate from the shreds the vector of offsets from step 1, which is equivalent to Vec<Transaction>, so we could then recreate the original Entry.