Suggestion for having better less queries on explorer

AhmedHanafy725 commented 2 years ago

Problem

Fetching all nodes and farms for graphql periodically on one request then filtering them in each client makes a big load on graphql, and this won't scale. This happens as graphql is not supporting (or maybe can't support them) all kinds of queries needed by the clients and the explorer. Also, graphql is not limiting the queries done by the users, so large queries can be done by one single request.

Suggestion

Suggestion 1

use only graphql
support missing queries on graphql like free capacity
apply limits on queries

suggestion 2

hide graphql from public users
gridproxy should query graphql and cache these data on DB
gridproxy should support all needed queries
gridproxy should support pagination
all clients and explorers depend on gridproxy, and they shouldn't build a query that requires fetching all data.

Cons

cache a lot of data
all queries will need to be implemented

Suggestion 3

same as suggestion 2 but

use graphql for queries, so gridproxy will act as a proxy with white list filters and limits
only implement queries that graphql is not supporting like capacity ones.

muhamadazmy commented 2 years ago

I think there is a problem of how we are using graphql. Graphql is not a database, and it's not even a service. It's a standard to build APIs. Similar to REST which allows you to query your service in a standard way. Let's say a SQL for APIs.

The idea is that you define some types schema on your server, then the client can choose what objects to query, and what to return. The server then can collect this data from different sources (database, cache, or even other APIs).

the current graphql is a development IDE that current exposes the database schema and maps the queries directly to database. This is actually nice and we can use but we can't leave this exposed to world like this because it can be abused.

Hence what I suggest is to modify gridproxy to build a new subset of the graphql API (using one of the go graphql libraries) to simply proxy calls to our (to be hidden) graphql. and merge the other values from graphql cache.

The idea is to always force limits (like only return max of 100 entries) and support offset so it still can be used with pagination. But introduce new fields that are injected on the object like the free resources (from cache) and so on. Note, I am not sure how we can then implement querying the free capacity on top of that. So some experiment is needed.

If finding nodes with specific free capcity won't work with this design, then there remaining 2 other options:

gridproxy then need to keep copy of the entire nodes list locally in a database, and then update free capacity in this database, then querying the nodes can be done directly on this database. everything else can be done directly over graphql forcing limits.
The other option (not sure if it's possible) run a worker next to graphql that update and adds the free capacity to the graphql database but we won't have the forced limits, and i really don't want to go there.

muhamadazmy commented 2 years ago

After many calls we agreed on the following:

The explorer does not need gridproxy to list nodes, or farms. Also the nodes filter, filters on the total node capacity hence the node listing can work directly on graphql. This will allow us to implement proper pagination.
- On accessing nodes pages, you retrieve only the first page. clicking next should return the next patch of nodes, and so on.
- Setting filters updates the query (totally done on server side) and rerender the first page.
- Same for farmers
- Counters can be updated every few minutes, and only when home page is viewed
- When a node is selected (details page) the grid proxy can be used to view total capacity and free capacity.

AhmedHanafy725 commented 2 years ago

After many calls we agreed on the following:

The explorer does not need gridproxy to list nodes, or farms. Also the nodes filter, filters on the total node capacity hence the node listing can work directly on graphql. This will allow us to implement proper pagination.

On accessing nodes pages, you retrieve only the first page. clicking next should return the next patch of nodes, and so on.

Setting filters updates the query (totally done on server side) and rerender the first page.

Same for farmers

Counters can be updated every few minutes, and only when home page is viewed

When a node is selected (details page) the grid proxy can be used to view total capacity and free capacity.

But we need for now all nodes request for the statistics page to know how much capacity(total cpu, memory,...) we have till the gridproxy can aggregate this data, no?

muhamadazmy commented 2 years ago

But we need for now all nodes request for the statistics page to know how much capacity(total cpu, memory,...) we have till the gridproxy can aggregate this data, no?

This is the "final" good state that we want to build. For now we can still fetch all the nodes until work on gridproxy is compelte

threefoldtecharchive / tfchain_explorer