oceanprotocol / pm

Zenhub needs each issue associated with one repo. This repo is a workaround, to mark issues that span >1 repos.
4 stars 0 forks source link

Service status page #126

Closed mihaisc closed 2 years ago

mihaisc commented 2 years ago

A page with all our services listed and with their status status.oceanprotocol.com

It will be a simple list with the name of the component, the url and the status Up/Down . The status is fetched live, no need for historical data, so no other backend involved. On click it can expand an show more details regarding that service (or other interaction that is also mobile friendly so no tooltips )

The order is obviously irrelevant, it's just how they popped into my head.

Services:

Bonus: define job to send notifications to a configurable list of email addresses if services are down. This changes a bit the architecture of the app, but nothing too big.

After further research 3rd party solutions are not a good fit for us.

LoznianuAnamaria commented 2 years ago

Additional services we can add:

mihaisc commented 2 years ago

Everything described here is subject to change if we have a better idea .

The solution will have 5 components, all will be hosted in aws, except for the status page (netlify)

  1. Monitoring

This will monitor all the defined endpoints and add the status in the db. At the moment i'm thinking of a table with a simple structure

id 
timestamp
component (provider, aquarius, subgraph , etc.)
chainId
chainName (not sure if this one is needed) 
details : a json with various info related to the component
status : `UP` , `WARNING`, `ERROR` , `DOWN` 

Env var :

b. Provider - https://v4.provider.{supported_chains[i].name}.oceanprotocol.com/

Checks:

c. Subgraph - https://v4.subgraph.{supported_chains[i].name}.oceanprotocol.com/

Checks:

d. Faucets

Checks:

e. Operator engine

@alexcos20 can you help here?

f. Market

Checks :

  1. Notification

Env var :

Basically each {time} minutes check the latest entries in the database for each component and send an email to {email_address} if there are any that are not UP . It should be just one mail with all the error/warning messages. There should be some kind of a mechanism to record the previous email sent so we don't spam.

  1. Api endpoints

For now i just think of only one endpoint that returns a list of all the components with the latest status (basically all the columns in the table)

  1. Status page

A simple page where we display what we get from the endpoint. We create a react component for each component (aqua, provider, etc) and then just iterate through the response and display it. No need for fancy live update or anything. Also if the status is UP we just show name+status . If it's anything else we display the extra details as well.

jamiehewitt15 commented 2 years ago

Couple of thoughts on this:

jamiehewitt15 commented 2 years ago

Regarding the status page, that could actually be a page within the main site (or even the market) to save us hosting and maintaining something else.

mihaisc commented 2 years ago

We will not be using vercel serverless functions, like i said we will host it in our infra. What is the advantage of using aws lambda vs some simple containers with node.js? The status page will be independent, don't want to mix it in other projects. Adding another site in netlify will not increase price also not much maintenance needed for that page.

jamiehewitt15 commented 2 years ago

What is the advantage of using aws lambda vs some simple containers with node.js?

Just the typical benefits of running serverless apps, it's usually cheaper and easier to manage. This is pretty small and self contained so it shouldn't make too much difference. We can do it as a small express app.

The status page will be independent, don't want to mix it in other projects. Adding another site in netlify will not increase price also not much maintenance needed for that page.

Ok sure, happy to proceed with that.