streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
166 stars 46 forks source link

optimize store module keys and values #542

Open abhimanyusinghgaur opened 2 months ago

abhimanyusinghgaur commented 2 months ago

As per the doc here: Screenshot 2024-09-26 at 1 12 36 PM

  1. All the numeric data types are currently string serialized and then stored. That is inefficient for storage space. They can simply be byte serialized for the purpose of storage (little endian or big endian, whatever) which would be more storage efficient as well as compute efficient too. If degubbing is needed, those bytes can always be converted to strings for easier debuggability. But, perf wise it would be better for them to be byte serialized.
  2. Same thing for keys. Even the keys are string serialized, while they could just have been raw bytes. Whenever using an ethereum address as a key in a store, right now, it has to be converted to hex which ends up taking 40 bytes instead of just 20 in the raw form (2x cost). One can use base64 to optimize this, but that too would take 28 bytes. So, while it was possible to do the same work in 20 bytes, it's not allowed to do so ATM. When I tried using address raw bytes as keys via unsafe string conversion, my module started behaving strangely. It kept reading and reading data, the substreams GUI showed it read around 1.2GB of data for a block range of 100 blocks which I then had to kill as it wasn't giving any output. While for the same block range when I was using hex converted address keys, it did the job in just ~60MB data read. So, looks like there is some internal limitation currently around using the raw bytes, which ideally shouldn't exist.

This is related to: https://discord.com/channels/666749063386890256/982135810742697984/1271115693093556295