threefoldtech / home

Starting point for the threefoldtech organization
https://threefold.io
Apache License 2.0
9 stars 4 forks source link

Healthchecker for the guardians #1535

Open xmonader opened 1 month ago

xmonader commented 1 month ago

A separate component to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node

Spawner

Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm)

Healthchecker VM

Should execute benchmark tests for 1- CPU 2- Disk 3- Network

And exposes them on an endpoint or a way to notify some other component

Aggregator/Collector

Should be able to collect test results from the VMs deployed either via pulling or via webhook

Syncing the results

That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins

AbdelrahmanElawady commented 1 week ago

Some questions and notes regarding the structure of various components after discussion with @ashraffouda :

VMs

We think designing the VMs to push benchmarks results to some aggregator instead of exposing them to a polling aggregator will make it easier to handle as we won't maintain a list of IPs in the aggregator or some sort of service discovery, it just waits for a request with benchmark results.

It will need of course secure communication so no other actor can send an invalid benchmark results.

DB

It's not clear which type of data we will keep so which database we will be using. Will we store all benchmarks over a period of time in a time-series way or we just store latest, or last 3 results? It is not clear yet.

Spawner

Will it spawn a number of VMs then exit leaving the VMs running? or it will spawn and wait for the benchmarks to run once then delete the VMs?