nucypher / nucypher

Threshold Access Control (TACo) runtimes for the Threshold Network
GNU Affero General Public License v3.0
696 stars 273 forks source link

Revisiting Prometheus metrics - roadmap #3240

Closed manumonti closed 9 months ago

manumonti commented 1 year ago

Prometheus can be a good first approximation to check and track the status of TACo nodes on the Threshold Network.

Having enough metrics can be useful for tracking appropriately if the nodes are complying with staking requirements and, hence, if the associated stakers are eligible for staking rewards.

Also, these metrics can be useful to monitor nodes' status, quickly discovering possible malfunctions or inefficiencies.

The roadmap can be something similar to:

  1. prometheus-client package is no longer a dev dependency, but a Ursula dependency: now, it is included in pipfile so it's installed for Ursula.
  2. Prometheus execution is optional: users can decide if they want Prometheus running or not. Also, we don't want to have a deprecated flag (--prometheus) in nucypher ursula run command for release 7.1, so this flag is removed, and enabling Prometheus is controlled by an environment variable or config file.
  3. Current metrics pruning and renaming: https://github.com/nucypher/nucypher/pull/3224 and https://github.com/nucypher/nucypher/pull/3232.
  4. Revisit current metrics and remove those that are not working or are no longer useful (like operator ETH balance).
  5. Add useful metrics: https://github.com/nucypher/nucypher/issues/3236
  6. Add return value to start Prometheus function: https://github.com/nucypher/nucypher/pull/3231
  7. Enabling support for multiple intervals to collect metrics (large interval for those metrics that requires a connection with a web3 provider as Infura and short interval for those metrics only dependent on the node status like RAM, CPU, etc.)

8. Prometheus running along Ursula is mandatory. No option to disable it. This is a necessary mechanism to calculate rewards and to collect statistics of the network.

  1. Add tests: https://github.com/nucypher/nucypher/issues/3218
derekpierre commented 9 months ago

The foundation of prometheus has now been completed.

Of course additional metrics can always be added as desired. Closing this issue now.