Open Omarabdul3ziz opened 8 months ago
I think i understand the overall purpose of this proposal and I agree with the issues you stated above. I just have some "clarification" questions:
grid-db
is where all mutations gonna happen, from both the events, and collected information from the nodes. Is that correct?syncer
and the mutation-client
?mutation-client
is just a sql client with write access to the grid-db
, correct?
- Is this a complete rewrite for the indexer -> processor pipeline? or it builds on top? I mean we still can reuse the indexer part since it just indexes raw events, and we can run our custom processors on top of that events log to rebuild the grid-db
yes for the indexer part, we can keep the usage as is. since we are using a ready subsquid image for ingesting the chain event, but for the processor part i think it will need to be rewritten to cover the new tables. maybe in go the current code base written on typescript
- I understand that the
grid-db
is where all mutations gonna happen, from both the events, and collected information from the nodes. Is that correct?
yes, it will be a central db for all data
- In the second graph, shouldn't the processor be between the
syncer
and themutation-client
?
i was thinking of storing the raw ingested chain data in the same database, so the processor will read from and store in the same database. i am not fully aware of how we can built the processor, should we actually read the raw chain data from the database or can we instead use the subsquid-gateway API. i will look at this part.
- The
mutation-client
is just a sql client with write access to thegrid-db
, correct?
yes
I think the indexer-db (which contains the raw data) should be separate from the "view" db. Since the indexed data can be huge, it can be slow and can affect the db? also will make it easier to reset the view/grid-db database and start over without losing the indexed data which is expensive to rebuild from the chain
Overview
to summarize how our data/api currently works. we handle two types of data in our apis:
How is this data handled?
gridproxy/node-indexer
It is a service that runs periodically based on a configured interval to ask the node for the needed data and store it directly on a db to be available after for the APICurrent flow:
graphql
repository:graphql/chain-indexer
: listens to chain events and dumps them intoindexer-db
graphql/indexer-gateway
: graphql api for the data stored inindexer-db
graphql/processor
: watches the indexer api/db and stores a meaningful information like farm/twin/contract/node inprocessor-db
. kind of normalized databasegraphql/processor-gateway
: graphql api that serves data onprocessor-db
with basic filtering on the db table.gridproxy
repository:gridproxy/processor
: sum of triggers placed on the normalized tables inprocessor-db
and aggregate them in a denormalized cache table.gridproxy/node-indexer
: periodically calls nodes and stores fetched data inprocessor-db
gridproxy/server
: rest api to serve data fromprocessor-db
Motivation for the proposal
graphq api problem
graphql/processor-api
is missing all the grid data coming from the indexer in the proxy because it only serves the tables made by the processor.rest api problem
/nodes
endpoint, it has become quite large. after adding the DMI fields i was suggesting to add vertical filtering to return only some fields of the response instead of all of them. this would look like/nodes?select=node_id,twin_id
. this is where GraphQL can help fix the over-fetching/under-fetching problem by defining what is actually needed in the requestMultiple processing logic
Proposal
by integrating the functionalities of both projects, we can achieve the following:
consolidate all chain and grid data into a single database/cluster. this database should implement role-based permissions, allowing different levels of access for querying and mutating data. with this it will be easier to mock clients for testing purposes. also role-based permissions will enable read-only access to production data, this is regularly needed for debugging and testing changes
implement a unified processor responsible for handling raw chain events and processing them into the final table. this eliminates the need for writing plain SQL triggers, simplifying the processing logic.
introduce a centralized database query client to serve as the primary interface for various operations. This client can be utilized as:
this will ensure uniformity across different interfaces
New structure
Plan:
considering the heavy usage of proxy/graphql by the clients, we need to implement this as a separate package, which can gradually replace the old api.
here are the steps: