near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.3k stars 601 forks source link

Time series analysis for testnet/mainnet stats #1480

Open MaksymZavershynskyi opened 4 years ago

MaksymZavershynskyi commented 4 years ago
  1. Motivation We need to collect and analyze time-series data from our testnet/mainnet to detect suspicious events and trends. Examples of things we want to be aware of:
  1. Implementation Proposal We collect major metrics the same way we do it with block explorer but also store it as time-series data. We then also have a spreadsheet with all marketing events annotated. We then continuously feed this data to the following tool below and display the results on the dashboard. Later, in the future, we add email alerts for anomalies.

Personally, I am very familiar with this powerful tool (https://github.com/google/CausalImpact) that can detect anomalies and dangerous trends in general time-series data. It can do the following major things:

One of the biggest advantages of it is that it allows annotating certain dates/datetimes as special, e.g. we launch some marketing program or make an announcement, and the tool would analyze these dates/datetimes differently.

vgrichina commented 4 years ago

https://facebook.github.io/prophet this one is also similar project, but by FB.

It's available for Python as well, so might be easier to work with.

MaksymZavershynskyi commented 4 years ago

Prophet is different, it is a forecasting tool. AFAIK forecasting tools use very different statistical approaches than anomaly detection tools. The short-term prediction feature of CausalImpact (the emphasis is on the word "short-term") is just a side-effect of its anomaly detection feature.

frol commented 4 years ago

I would say that monitoring and anomaly detection are crucial for any product. I have limited experience in this area, and it seems to be quite a bit of work to get things configured, yet I wish this infrastructure is in place for NEAR.

MaksymZavershynskyi commented 4 years ago

I would say that monitoring and anomaly detection are crucial for any product. I have limited experience in this area, and it seems to be quite a bit of work to get things configured, yet I wish this infrastructure is in place for NEAR.

I have very extensive experience working with this tool. If we have time series in the form: (time, measurement) where measurement is a vector then we create a cron-job that periodically takes the most recent 3 days (or 3 weeks, depending on the granularity) of data, gives it in csv format to this tool, the tool spits out PNG file and CSV file with the graph that we serve on our dashboard.

This feature will only be useful after TestNet or MainNet launch because before that we don't have enough traffic to draw any conclusions, but once we have traffic we really want to be on the top of weird things happening to our network by catching anomalies.

icerove commented 4 years ago

Definitely agree to monitor the metrics on the transaction to let the graph or metrics reveal some really useful information for our blockchain, we can first try it on one shard and see if it is easy for us to make it like real time or just have small delay. Cause time-series analysis always have long time delay for real time data. And also find out which package, R package or python one, more fit for our blockchain.

frol commented 4 years ago

https://github.com/tokio-rs/tracing might be helpful on the Rust side of things to instrument the events reporting.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.