Closed maoueh closed 1 year ago
@Eduard-Voiculescu or @billettc will bootstrap you on the content of this.
Some content that we discussed today
Question on this part:
"Map are used for data extraction, filtering and transformation. They should be used when direct extraction without re-using them later in the pipeline is needed. For example, model data extraction from event or function's inputs are good example. People should also favor 1 map over N mappers each extracting a single event. It's better to perform as much extraction as possible from top-level mapper and pass that data around for later consumption"
Would it be good to explain the why on this? It was the first question that came to mind for me. Why is it "better to perform as much extraction as possible from top-level mapper?"
It might be too much to include in the docs. I'm still curious though.
For example, model data extraction from event or function's inputs are good example. People should also favor 1 map over N mappers each extracting a single event. It's better to perform as much extraction as possible from top-level mapper and pass that data around for later consumption
So I wrote that but I know feels it should be more on a section about performance considerations of a Substreams. This is not really related to when you should choose a map
vs a store
.
If we remove that for now, do you have further questions?
The content was pretty good from my perspective. I'm just curious why it's best to restrict to the single map module. It's not pressing though. I guess curiosity for a better, deeper understanding in general for me.
Simplicity, both of the backend and the consumer experience. Unless there's a compelling use case, simpler is better.
I've reviewed the existing documentation for modules.
I believe the best place to add the new documentation for maps and stores will be on this page: https://substreams.streamingfast.io/concept-and-fundamentals/modules
I believe we should focus on this sentence to begin forming the outline of the new content: "give some general guidelines as well as clear use cases like example of when to choose on over the other."
If that sounds like a good path forward let me know and I'll create a new branch to work in to add this new content.
If not please let me know where else this would be better added.
Also, is there additional input available from @Eduard-Voiculescu or @billettc for this?
I've created the initial draft content for the new section on the existing modules page. The PR is available for review here: https://github.com/streamingfast/substreams/pull/101
We need to better document when user should use a
map
and when to usestore
. We need to give some general guidelines as well as clear use cases like example of when to choose on over the other. Finally, we also need to clearly outline anti-pattern. Let's try to come up with an outline.Map
are used for data extraction, filtering and transformation. They should be used when direct extraction without re-using them later in the pipeline is needed. For example, model data extraction from event or function's inputs are good example. People should also favor 1 map over N mappers each extracting a single event. It's better to perform as much extraction as possible from top-level mapper and pass that data around for later consumptionStore
should be used for aggregation of value and to store state that exists "across" block. They are not free-form storage location. Unbounded store is usually a bad idea (not sure of the exact wording, we don't want to scare people neither).One anti-pattern is extracting for example Entities from Events in a mapper and pass that directly to a store that directly store the model in a
StoreProto
without ever reading back the data. This is totally useless, remove thestore
and consume the entities directly.