sovrin-foundation / steward-council

1 stars 1 forks source link

Health Workstream Metrics & Monitoring #1

Open NickyHickman opened 4 years ago

NickyHickman commented 4 years ago

Metrics and Monitoring for Indy Networks

Purpose of this document:

Background and Rationale

Who are these metrics for?

For this document we have only focused on the needs of Sovrin as a community of Stewards (Indy Node Operators) and as an Indy Network Operator and Governance Authority.

However there are other other roles within the ecosystem that may use these metrics or may define their own metrics. For example an Agency at layer 2 might select different networks based on specific networks for specific types of transactions. Or an ecosystem might select a specific sub-set of the network of networks based on measures of decentralization or performance. An ecosystem that is concerned with IoT for example will require very high throughput and capacity vs an ecosystem that is all about KYC for private banking will require lower capacity and performance but higher consensus and freshness. Equally we may find other metrics that are useful in the future as we learn more from these data.

We have sub-divided the our uses of these metrics into 4 groups

  1. Node Operators. Focuses on node health and performance, fault monitoring, local security and conformance within the network
  2. Network Operations. This focuses on network health, fault monitoring, security, and technical roadmap
  3. Public Dashboard. It's important to demonstrate to network users, and others the health of the network and to be able to measure against performance of other SSI networks. This builds confidence, trust and accountability
  4. Business. Whether a for profit of not for profit model, every network operator needs to be sustainable

Categories of Metrics

We have grouped the metrics into 4 Categories

  1. Network Health including Availability, Performance and Quality. Four of these metrics build on the Work of the Hyperledger Performance & Scale Working Group’s white paper on metrics October 2018.
  2. Business primarily focused on uptake and usage, this category of measures over time will enable us to build patterns of usage and behaviours which in turn will help inform security as well as measures for business performance in building market adoption
  3. Governance Framework Compliance this measures essential components of SSI such as Diversity and Decentralization, they are relevant to the role of Governance Authorities and attempt to measure some of the qualities of SSI.

Table of proposed metrics

Category Measure Node Operators Network Operations Public Dashboard Business How? / Notes
Network Health: Capacity Availability % Uptime 1 1 1 1 Monitor availability across nodes


Current status (dashboard)

Steward response time


Correlate with events - upgrades, etc.

Network Health: Capacity Capacity % utilisation 1 1 Needed to support security controls and enable minimum permissible pricing as well as dimensioning / planning
Network Health: Performance Read Latency 1 1 Hyperledger Metric: = Time when response received – submit time
Network Health: Performance Read Throughput 1 1 Hyperledger Metric: = Total read operations / total time in seconds
Network Health: Performance Transaction Latency 1 1 Hyperledger Metric: = (Confirmation time @ network threshold) – submit time
Network Health: Performance Transaction Throughput 1 1 1 Hyperledger Metric = Total committed transactions / total time in seconds @ #committed nodes
Network Health: Quality Consensus 1 1 1 Monitor consensus across nodes;

Possible - monitor view change events

Network Health: Quality Freshness 1 1 1 Freshness timestamp reported by each validator node.
Network Health: Quality Reputation 1 1 Future: Score nodes and (when in n/w of n/w’s) score network using Open Reputation where the node is the entity. Could also apply to transaction endorsers etc
Business Usage # Writes 1 Indy Monitor - Track writes across ledgers
Business Usage # Transaction Authors & Endorsers 1 Track TA and TE DIDs
Business Uptake: # new TAs & TEs 1 Track TA and TE DIDs
Governance Framework Compliance Diversity: Geo-location of Stewards and Nodes 1 1 Diversity can be measured, but diversity is measured against attributes we therefore need to identify elements which must be diversified both at a technical and organizational level, assign attributes (claims) against them and then use these types of metrics to measure diversity, this can be set at different levels as the network grows
Governance Framework Compliance Diversity: Server / host type for Nodes 1 1 See above: Geo IP lookup
Governance Framework Compliance Sustainability: % Churn rate of Stewards 1 1 Monitor Steward lifecycle in HubSpot
Governance Framework Compliance Sustainability: Av. Cost / year to run a node 1 1 1 Qualitative annual Survey
Governance Framework Compliance Decentralization: level of hierarchy or influence 1 1 Measure heirarchy in networks

Level of influence (there is a clever mathematical formula that enables you to measure levels of influence in networks, and a good deal of research in this domain eg f .

Future metrics to consider:

In future, several factors several factors suggest that further metrics may be required:

  1. High volumes of usage on individual networks may require a % capacity metric. This could also support networking within a network of networks, pricing and security measures.
  2. Operation of a network of networks (or grid) to build a global public utility layer may require the application of many of the above metrics by a governance authority at the ‘grid’ level (horizontally at layer 1 in the ToIP Stack vs vertically slice from layer 4 down)
NickyHickman commented 4 years ago

@lohanspies @kiview - here is my starting document for the what we measure and why

lohanspies commented 4 years ago

@NickyHickman moved the document into the indy-health folder. Suggest we track changes towards a v1 release document on health and metrics for Indy Networks.

swcurran commented 3 years ago

That's a great list and the majority of those metrics are available today -- we could be a couple of dev weeks away from having all of this...

Once retrieved, all can be passed to a log collector for visualization.

NickyHickman commented 3 years ago

thanks for feedback - thinking it would be good to start with a core 3-5 max 7 key metrics for the public dashboard. ultimately I would love us to have a 'Net Trust Score' for the network - setting that standard etc