Open phahulin opened 6 years ago
Some initial thoughts:
Dashboard sorting was already listed. Just want to add that sorting should have an option to show only validator nodes, only bootnodes, all nodes, etc.
Collect stats on the node. We may look into "collectd" deamon. It is one of the easiest ways to collect common metrics from nodes, but ALSO it has a plugin system which means we could expand it to our needs.
On receiving end, there would be aggregator/collector script that will collect data for some set period of time. Then pass it on / store it.
Since there would several (many) nodes that send data at the same time, there should be another script that performs a load balancing and working in parallel with collector above
We should look into using InfluxDB. It has an HTTP endpoint, so we could just POST data to it, from a shell command or within an application
There should be some logic to delete data that is not needed for historic viewing. For example we will keep block info and anything blockchain specific, but could delete CPU usage after N number of days.
Agents could be written in JS or Python or both. Several light weight scripts will perform their duties and send data to "collector".
InfluxDB post example:
curl -i -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE mydb"
Downside: Looks like InfluxDB is no longer actively developed..
Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.
@maratP Python is not in our stack at the moment.
Preferable language are be Elixir, Rust, Javascript(Node) in desc order
My understanding is that the currrent netstats consist of (2) components.
The fundamental issue is these components are tightly coupled. So how to decouple and make extensible in a reasonable way.
Hmmm, my thoughts ....
Since you are interested potentially displaying data from multiple independent data sources it may be prudent to look at a dashboard framework like this ( https://grafana.com/ ). NOTE: not an endorsement just a visualization tool for requirements. Is this kinda of what you are thinking?
If you use a front end framework like above and publish/subscribe model things are decoupled. This also gives flexibility to when and where the data is stored persistently ( if ever ) as it will be deferred to the subscriber implementation. This also gives a lot of flexibility to the publisher implementation ( could be anything ) as long as the publisher/subscriber agree on data format. ( here I am assuming these "monitoring-jobs" still run locally on each node ). I suppose it would be nice these "monitoring-jobs":
@igorbarinov, good to know about language preferences
John, agree on Graphana. My reasearch and what I described above also points to Graphana
Hello everyone--
most of the tools described here are already available in the great Open Source project Libre NMS: https://www.librenms.org/ Full, very active Github: https://github.com/librenms/librenms There is an active demo on the site, so take it for a spin. Libre NMS has full API, messaging and alerting systems, native iOS and Android apps, etc. Very robust, configurable for almost all uses. It has built-in hooks for collectd, rrd, and almost all of the Open Source standard monitoring, alerting and graphing tools.
Tools like Libre NMS - https://www.librenms.org - provide real time monitoring and notification, and create incredible historic graphs, allowing us to visually see patterns over time not apparent in snapshot images. The storage problem is solved, so no need to pick and choose with statistics to retain. They encourage forking and component adoption; good tool.
It does seem like one would need a db for the sheer amount of data points. One of my machines is running parity (pointing to core) and I could see then dumping into something like CockroachDB (or Spanner - then could one easily use google chart I believe?). Then have a custom dashboard as one sees fit. Oh another db option that I've used: fauna db - it's nice too.
RRDtool is built specifically to handle this sort of data in a stable, fixed sized database. Libre NMS (and most other monitoring tools) use this by default. Very stable, efficient tool; sort of the bedrock for monitoring systems for the past 20 years and the foreseeable future.
Title
Abstract
A new service to gather and process network statistics is proposed
Rationale
At the present moment network statistics is gathered by a swarm of agents installed on network nodes that send it to a central server which displays it in a dashboard-like web interface. Together agents and dashboard make up two parts of a single service.
Current implementation has several shortcomings:
Specification
Implementation
main process should be written in such a way that uncaught exceptions that may occur inside any of the plugins, sinks or receivers do not lead to crash and do not influence other plugins
Examples of plugins are:
Examples of receivers are: