pokt-network / pocket

Official implementation of the Pocket Network Protocol v1
https://pokt.network
MIT License
63 stars 33 forks source link

[Document] Health Module #360

Closed jessicadaugherty closed 1 year ago

jessicadaugherty commented 1 year ago

Objective

Better understand the current state of the health module branch for consideration in scoping/building additional monitoring.

Origin Document

Knowledge Transfer Requests

Goals

Deliverable


Creator: @jessicadaugherty

andrewnguyen22 commented 1 year ago

Health Module Light Documentation

File Breakdown

Metrics

This file contains the highest level structure that encapsulates all the other file structures and functions.

type HealthMetrics struct {
  BlockMetrics map[int64]BlockMetrics
}

type BlockMetrics struct {
  Timestamp time.Time
  Height int64

  ConsensusMetrics ConsensusMetrics
  DataSizeMetrics DataSizeMetrics
  LifecycleMetrics LifecycleMetrics
  StateMetrics StateMetrics
  TransactionMetrics TransactionMetrics
}

The organization of the health metrics in a block-by-block fashion, ensures the consumer is able to easily query necessary Health Metrics with simply using the height as the key.

The segmentation of block-by-block is also intentional to prevent the server timeout that happens with Tendermint's 26657: consensus endpoints.

Consensus

The Health Module exporter exposes Tendermint specific 'block by block' Consensus information in a single endpoint JSON endpoint.

type Round struct {
  Proposer tm.Address
  RoundNumber int64
  MaxStep types.RoundStepType
  ConsensusTiming ConsensusTiming
  PreVotes VoteMetrics
  PreCommits VoteMetrics
}

type ConsensusMetrics struct {
  Rounds map[int64]Round
}

type ConsensusTiming struct {
  ProposeTime int64
  PreVoteTime int64
  PreCommitTime int64
}

This information is very helpful during a chain halt or monitoring in order to build alerts, monitoring, and automation around Pocket Network's Tendermint BFT.

Data

Part of the guard rails effort is to track the size of the state and block databases. The Health Module exporter exposes metrics regarding the block-by-block growth of these two databases.

type DataSizeMetrics struct {
  BlockSize int64
  StateSize int64
}

func (hm *HealthMetrics) AddBlockSizeMetric(height int64, blockSizeInBytes int64)

func (hm *HealthMetrics) AddStateSizeMetric(height int64, stateSizeInBytes int64)

The exporter enables tracking of the block-by block incremental growth and the overall rate of growth over a certain period of time.

Lifecycle

Every Tendermint application follows the ABCI interface. The duration of processing of each part of the lifecycle is telling to the health of ABCI state machine processing performance.

type LifecycleMetrics struct {
  ApplyBlockTime int64
  BeginBlock int64
  DeliverTxs int64
  EndBlock int64
}

State

The Health Module exporter also enables tracking of state machine specific metrics. In Pocket Network and most blockchains, the distributed state machine is fundamental to the crypto-economics as well as the utility function of the network.

type StateMetrics struct {
  AppHash string
  JailMetrics JailMetrics
  SessionMetrics SessionMetrics
}

type SessionMetrics struct {
  SessionsGenerated int64
  SessionGenerationTimes []int64
  TotalRelays int64
}

type JailMetrics struct {
  TotalJailed int64
  JailedValidators []Validator
}

Tracking Jailings, number of Sessions generated and duration as a block is processed, total relay count and calculated AppHash are all metrics the State structure exports.

Transaction

Transactions are the fundamental unit of blocks in a blockchain. The Health Module exporter enables the guard rails effort to track the computational expense of each transaction and categorize it by metadata.

type Transaction struct {
  TypeOf string
  ProcessingTime int64
  Sender string
  IsValid bool
}

type TransactionMetrics struct {
  TotalTransactions int64
  TotalValidTxs int64
  TotalInvalidTxs int64
  Transactions []Transaction
}

The transaction specific sub-module enables a global view of valid and invalid transactions - as well as insight into the general behaviors of the blockchain participants.

jessicadaugherty commented 1 year ago

This is amazing @andrewnguyen22! Gives us a huge head start on the monitoring exporter we discussed. I'll review this with @okdas and resume our negotiations of ownership with the infra team. Thank you so much!