streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
160 stars 45 forks source link

Implement a `set_sum` updatePolicy for stores #466

Closed abourget closed 3 months ago

abourget commented 4 months ago

PropellerHeads need this.

Definition of done done:


By having a single byte prefix in an add store, we could know if someone has set a key, and then the merge operation could either SET of add with the previous segment's keys.

Example flow:

1 key1 add(5)
2 key1 add(5)
3 key1 add(5)

SNAPSHOT:
key1 a15

10 key1 add(5)
11 key1 set(50)
12 key1 add(5)
13 key1 set(25)
14 key1 add(5)

SNAPSHOT:
key1 s30

20 key1 add(5)

SNAPSHOT:
key1 a5  (there was no `set` in here, so we'll add `5` to the previous segment's `30`'s final output.)

This would help certain folks trying to use some set (when the liquidity of something is published in absolute terms), and track with greater precision all the adds and removals that are tracked in relative terms (amounts of burns, mints, etc..)

This feature would not be so complex to build, it would add an updatePolicy here and there.. but all those functions are very simple.

colindickson commented 4 months ago

@abourget how would you feel about an "in-place" migration for all existing add stores, where we check the first byte for "a" or "s", and if it is not there, treat thee existing snapshot as an "a" and save the next snapshot accordingly?

colindickson commented 4 months ago

@abourget or do you think it would be best to simply have a new store update policy called AddOrSet ?

abourget commented 4 months ago

what about SetSum ? add is a bit unfortunate.. set: and sum: prefixes because they will be visible in StoreDeltas arriving in some modules' inputs.. so a quick separator and a readable term would be good

jubeless commented 4 months ago

The set+sum store cannot be parallelize since it is not an associativity operation.

We know that substreams requires associativity of operations to work, since it can "group" the work into shards, i.e. (a+b)+(c+d) must be equal to (a+b+c) + (d) given a set of operations on a store that allow "adds" and "sets": add(2) add(5) set(0) add(6)

The first grouping gives 13 = (+2, +5) + (set0, +6) = +7 +6 = 13

But this grouping gives 6 (+2 +5 set0) + (+6) = 0 + 6 = 6

abourget commented 4 months ago

(+2, +5) + (set0, +6) -> 6, not 13, because each time we hit a set, we erase whatever was accumulated from the left

colindickson commented 3 months ago

example substreams posted here: https://github.com/streamingfast/substreams-examples