Closed mihaisc closed 2 years ago
Additional services we can add:
Everything described here is subject to change if we have a better idea .
The solution will have 5 components, all will be hosted in aws, except for the status page (netlify)
This will monitor all the defined endpoints and add the status in the db. At the moment i'm thinking of a table with a simple structure
id
timestamp
component (provider, aquarius, subgraph , etc.)
chainId
chainName (not sure if this one is needed)
details : a json with various info related to the component
status : `UP` , `WARNING`, `ERROR` , `DOWN`
Env var :
time: in minutes, represents the cron job timer
supported_chains : list of supported chains . The name is the name that we are using in the urls ex : v4.subgraph.mainnet.oceanprotocol.com
[
{id: 1, name: "mainnet" , rpcUrl: "..." }
]
Components:
a. Aquarius - https://v4.aquarius.oceanprotocol.com/
Checks:
get version (http get on main url) and check with latest release . If mismatch then status is WARNING
, if you can't reach it then DOWN
get chain list (https://v4.aquarius.oceanprotocol.com/api/aquarius/chains/list
) and check if all supported chains are on this list (if not, warning)
for each chain check if last_block is up to date ( check aqua docs) (warning)
do a random query to check if assets are actually indexed , be creative
store all this data
b. Provider - https://v4.provider.{supported_chains[i].name}.oceanprotocol.com/
Checks:
WARNING
, if you can't reach it then DOWN
ERROR
. no need to store the resultsc. Subgraph - https://v4.subgraph.{supported_chains[i].name}.oceanprotocol.com/
Checks:
WARNING
, if you can't reach it then DOWN
WARNING
) ERROR
)d. Faucets
Checks:
WARNING
)e. Operator engine
@alexcos20 can you help here?
f. Market
Checks :
Env var :
Basically each {time} minutes check the latest entries in the database for each component and send an email to {email_address} if there are any that are not UP
. It should be just one mail with all the error/warning messages. There should be some kind of a mechanism to record the previous email sent so we don't spam.
For now i just think of only one endpoint that returns a list of all the components with the latest status (basically all the columns in the table)
A simple page where we display what we get from the endpoint. We create a react component for each component (aqua, provider, etc) and then just iterate through the response and display it. No need for fancy live update or anything. Also if the status is UP we just show name+status . If it's anything else we display the extra details as well.
Couple of thoughts on this:
Regarding the status page, that could actually be a page within the main site (or even the market) to save us hosting and maintaining something else.
We will not be using vercel serverless functions, like i said we will host it in our infra. What is the advantage of using aws lambda vs some simple containers with node.js? The status page will be independent, don't want to mix it in other projects. Adding another site in netlify will not increase price also not much maintenance needed for that page.
What is the advantage of using aws lambda vs some simple containers with node.js?
Just the typical benefits of running serverless apps, it's usually cheaper and easier to manage. This is pretty small and self contained so it shouldn't make too much difference. We can do it as a small express app.
The status page will be independent, don't want to mix it in other projects. Adding another site in netlify will not increase price also not much maintenance needed for that page.
Ok sure, happy to proceed with that.
A page with all our services listed and with their status
status.oceanprotocol.com
It will be a simple list with the name of the component, the url and the status Up/Down . The status is fetched live, no need for historical data, so no other backend involved. On click it can expand an show more details regarding that service (or other interaction that is also mobile friendly so no tooltips )
The order is obviously irrelevant, it's just how they popped into my head.
Services:
Aquarius
: on expand show what get returns{"plugin":"elasticsearch","software":"Aquarius","version":"3.1.1"}
Provider
: a list of all providers. For each, on expand, show what get returns{"chainId":1, .... :"Provider","version":"0.4.18"}
Subgraph
: a list of all subgraphs. For each, on expand, do the _meta query and show current blockMarket
: no detailed view, just up or down, we ca do a get or somethingFaucet
: a list of all faucets. Like market no detailed viewOperator Engine
: prod and dev engine statusOcean port
: https://port.oceanprotocol.com/, like market no details, just up or downBonus: define job to send notifications to a configurable list of email addresses if services are down. This changes a bit the architecture of the app, but nothing too big.
After further research 3rd party solutions are not a good fit for us.