threefoldtech / tfgrid-sdk-go

Apache License 2.0
2 stars 4 forks source link

proposal: merge gridproxy and graphql into a new Grid-API #843

Open Omarabdul3ziz opened 8 months ago

Omarabdul3ziz commented 8 months ago

Overview

to summarize how our data/api currently works. we handle two types of data in our apis:

  1. chain data: this data is related to transactions and states on the chain. It includes contracts, billings, reserved capacity, farms, twins, and IP addresses, ...etc.
  2. grid data: this data is related to the node itself, such as its capacity, uptime reports, public configuration, hardware information like GPU/DMI, and some performance test results.

How is this data handled?

Current flow: graphql repository:

gridproxy repository:

Motivation for the proposal

  1. graphq api problem

    • the graphql/processor-api is missing all the grid data coming from the indexer in the proxy because it only serves the tables made by the processor.
    • also it doesn't have good filtering system there is no such an inter-tables filtering. tor example, in most cases, filtering a farm based on its nodes fields or vice versa.
  2. rest api problem

    • due to the repetitive need to add more fields on the rest response, especially the /nodes endpoint, it has become quite large. after adding the DMI fields i was suggesting to add vertical filtering to return only some fields of the response instead of all of them. this would look like /nodes?select=node_id,twin_id. this is where GraphQL can help fix the over-fetching/under-fetching problem by defining what is actually needed in the request
  3. Multiple processing logic

    • merging the processing logic is important since there are two places that process the data: the processor in graphql and the triggers on the proxy. this sometimes make things unclear while debugging an issue. It would be easier if the logic is in one place.
    • also it would be easier to write/debug the plain sql code for trigger functions in Go

Proposal

by integrating the functionalities of both projects, we can achieve the following:

New structure

graph LR
    subgraph mutation
  n((node))
    c((chain))

    s([syncer])
    p([processor])
    i([indexer])

    dbm{{db-mutate-client}}

    n --> i --> dbm
    c --> s --> dbm
    dbm --> p --> dbm
    end

    subgraph query
    gc[go-client]
    rest[rest-api]
    gql[graphql-api]

    dbq{{db-query-client}}

    dbq --> gc
    dbq --> rest
    dbq --> gql
    end

    db[(grid-db)]
    dbm --> db --> dbq

Plan:

considering the heavy usage of proxy/graphql by the clients, we need to implement this as a separate package, which can gradually replace the old api.

here are the steps:

muhamadazmy commented 8 months ago

I think i understand the overall purpose of this proposal and I agree with the issues you stated above. I just have some "clarification" questions:

Omarabdul3ziz commented 8 months ago
  • Is this a complete rewrite for the indexer -> processor pipeline? or it builds on top? I mean we still can reuse the indexer part since it just indexes raw events, and we can run our custom processors on top of that events log to rebuild the grid-db

yes for the indexer part, we can keep the usage as is. since we are using a ready subsquid image for ingesting the chain event, but for the processor part i think it will need to be rewritten to cover the new tables. maybe in go the current code base written on typescript

  • I understand that the grid-db is where all mutations gonna happen, from both the events, and collected information from the nodes. Is that correct?

yes, it will be a central db for all data

  • In the second graph, shouldn't the processor be between the syncer and the mutation-client ?

i was thinking of storing the raw ingested chain data in the same database, so the processor will read from and store in the same database. i am not fully aware of how we can built the processor, should we actually read the raw chain data from the database or can we instead use the subsquid-gateway API. i will look at this part.

  • The mutation-client is just a sql client with write access to the grid-db, correct?

yes

muhamadazmy commented 8 months ago

I think the indexer-db (which contains the raw data) should be separate from the "view" db. Since the indexed data can be huge, it can be slow and can affect the db? also will make it easier to reset the view/grid-db database and start over without losing the indexed data which is expensive to rebuild from the chain