masa-finance / masa-oracle

Masa Oracle: Decentralized Data Protocol 🌐
https://developers.masa.ai/docs/category/oracle-node
MIT License
11 stars 12 forks source link

[PRD] Real-Time Decentralized Network Monitoring Stack with Consul, Prometheus, and Grafana #292

Open theMultitude opened 1 month ago

theMultitude commented 1 month ago

Overview

Summary

This Product Requirement Document outlines a proposal for the setup and integration of Consul, Prometheus, and Grafana on AWS for real-time monitoring of the Masa Protocol using Docker for deployment and Terraform for managing AWS infrastructure.

Goal

The rapid creation of a resilient, extensible, real-time monitoring system.

Audience

Masa Protocol Team

CPG-Flow-2024-05-24-2300

Background and Context

Problem Statement

At Masa we’re looking to build an event driven data architecture as a means to gather data from our nodes. This approach provides resilience, flexibility, and scalability. However, it comes with some challenges in the short term:  

  1. events are granular and need to be processed post ingestion into coherent datasets.
  2. datasets need to be visualized and made available to relevant parties.
  3. this process needs to be low enough latency to enable quick responses to novel issues.

In essence, the proposed stack allows Masa to get access to critical protocol information while our more general event system is still maturing.

In-Scope

Features and Functionality

Deliverables

Out-of-Scope

Excluded Features

Testing and Validation

Testing Strategy

Validation Criteria

User Stories

Protocol Monitoring

Title: Utilize Prometheus for Node Monitoring As an: Oracle Developer I want: to integrate Prometheus to collect and store metrics from all services So that: I can monitor system performance, identify issues in real-time, and ensure system reliability Acceptance Criteria:

Node Discovery

Title: Implement Consul for Node Discovery As an: Oracle Developer I want: to use Consul for dynamic node discovery and health checks So that: services can automatically be discovered and relayed to Prometheus Acceptance Criteria:

Separation of Concerns

Title: Consolidated/Abstracted Node Analytics As a: Data Lead I want: to have analytics separated from general oracle function So that: modification of oracle functionality does not break information services Acceptance Criteria:

Further Notes

theMultitude commented 1 month ago

@Luka-Loncar @j2d3 @restevens402 @teslashibe @jdutchak @nolanjacobson A first pass at a PRD. Still to be done with the team is final confirmation of design (including scope) and ticketing breakdown.

Comments welcome.

theMultitude commented 1 month ago

A Loom for some perspective on how this architecture would work in it's end state.

And an example of Consul's key, value store structure: image

Nodes are discrete machines that can have multiple services. Both nodes and services can have health checks if we desire.