streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
164 stars 45 forks source link

General guidelines for when to use a Map vs a Store module #65

Closed maoueh closed 1 year ago

maoueh commented 1 year ago

We need to better document when user should use a map and when to use store. We need to give some general guidelines as well as clear use cases like example of when to choose on over the other. Finally, we also need to clearly outline anti-pattern. Let's try to come up with an outline.

Map are used for data extraction, filtering and transformation. They should be used when direct extraction without re-using them later in the pipeline is needed. For example, model data extraction from event or function's inputs are good example. People should also favor 1 map over N mappers each extracting a single event. It's better to perform as much extraction as possible from top-level mapper and pass that data around for later consumption

Store should be used for aggregation of value and to store state that exists "across" block. They are not free-form storage location. Unbounded store is usually a bad idea (not sure of the exact wording, we don't want to scare people neither).

One anti-pattern is extracting for example Entities from Events in a mapper and pass that directly to a store that directly store the model in a StoreProto without ever reading back the data. This is totally useless, remove the store and consume the entities directly.

maoueh commented 1 year ago

@Eduard-Voiculescu or @billettc will bootstrap you on the content of this.

maoueh commented 1 year ago

Some content that we discussed today

seanmooretechwriter commented 1 year ago

Question on this part:

"Map are used for data extraction, filtering and transformation. They should be used when direct extraction without re-using them later in the pipeline is needed. For example, model data extraction from event or function's inputs are good example. People should also favor 1 map over N mappers each extracting a single event. It's better to perform as much extraction as possible from top-level mapper and pass that data around for later consumption"

Would it be good to explain the why on this? It was the first question that came to mind for me. Why is it "better to perform as much extraction as possible from top-level mapper?"

It might be too much to include in the docs. I'm still curious though.

maoueh commented 1 year ago

For example, model data extraction from event or function's inputs are good example. People should also favor 1 map over N mappers each extracting a single event. It's better to perform as much extraction as possible from top-level mapper and pass that data around for later consumption

So I wrote that but I know feels it should be more on a section about performance considerations of a Substreams. This is not really related to when you should choose a map vs a store.

If we remove that for now, do you have further questions?

seanmooretechwriter commented 1 year ago

The content was pretty good from my perspective. I'm just curious why it's best to restrict to the single map module. It's not pressing though. I guess curiosity for a better, deeper understanding in general for me.

abourget commented 1 year ago

Simplicity, both of the backend and the consumer experience. Unless there's a compelling use case, simpler is better.

seanmooretechwriter commented 1 year ago

I've reviewed the existing documentation for modules.

I believe the best place to add the new documentation for maps and stores will be on this page: https://substreams.streamingfast.io/concept-and-fundamentals/modules

I believe we should focus on this sentence to begin forming the outline of the new content: "give some general guidelines as well as clear use cases like example of when to choose on over the other."

If that sounds like a good path forward let me know and I'll create a new branch to work in to add this new content.

If not please let me know where else this would be better added.

Also, is there additional input available from @Eduard-Voiculescu or @billettc for this?

seanmooretechwriter commented 1 year ago

I've created the initial draft content for the new section on the existing modules page. The PR is available for review here: https://github.com/streamingfast/substreams/pull/101