method to monitor offical TESTNET & full RPC node

bobinson commented 6 years ago

We need a method to monitor the logs generated in the TESTNET while generating transactions. This can be something like a plain old log aggregation or an APM.

Considering the fact that we will need to provide access to the infrastructure, the safety aspects of the infrastructure will be a bottleneck and I am aware of it.
Why an APM is needed is described in a post on steemit : https://steemit.com/witness/@bobinson/request-apm-or-minimal-log-monitoring-for-the-steem-testnet-community-testnet-update-23rd-october-2018

Gandalf-the-Grey commented 6 years ago

If you are part of the testnet then you deal with logs on your end, in a way you like and in a way it fits your infrastructure, workflow, etc, etc. That has nothing to do with steemd as such.

bobinson commented 6 years ago

@Gandalf-the-Grey - I think There was a mistake/confusion in what I wrote. I end up mixing up TESTNET, full RPC nodes, test condensor etc.

To be precise, what we need is,

A mechanism to monitor the logs when we generate transactions on

https://testnet.steemitdev.com when generated via Tinman
https://testnet.steemitdev.com when generated via Condensor at http://condensertestnet.steemitdev.com/
If possible, integrate a test account creation faucet and in the future hivemind

reason While this has nothing to do with steemd or TESTNETs, IMHO our best shot is generating as many transactions as possible in one deployment and cover as many scenarios and code as possible. Further AFIK Steemit developers are best to identify and address the errors (appearing in the logs or APM). This is no exercise to bring transparency or any other such "buzzwords" but this just an approach to ensure more code coverage, run as much code as possible, consume as much as CPU cycles as possible and have as many qualified eyes observe the result as possible.

Next Steps Personally I am interested in measuring the TPS. We can do lot more like capturing exactly how many times code change from a specific commit was tested, how many scenarios are tested etc etc (Think of a marriage between water fall software development and agile.) We have an open item to audit the replay efficiency here https://github.com/steemit/steem/issues/3089 - along the similar lines, we can do improvements after HF20 changes and also have a mechanism for future models too. Right now the execution times and various values are calculated based on the metrics from api.steemit.com : but we haven't generated comparable number of transactions on /testnet.steemitdev.com and evaluated. We are essentially not doing measurements on the test environment due to various limitations. In a nutshell, having a proper mechanism to aggregate logs, analyze and report them will pave way for an improved process.

quochuy commented 6 years ago

Any witness having a test node connected to the testnet could install an APM on it. However if Steemit Inc change their node more frequently than they provide us with a release then it’s better if they have one themselves.

Additionally an APM for the RPC nodes, the seed and also the condenser can also be very useful. You could aggregate the data from all of them and links events together. Maybe something happened on a specific version of the condenser that triggers a specific behaviour on the RPC triggering in turn a bug on the witness node?

On top of that, add a helper that also records resource usage and you could do more

steemit / steem

method to monitor offical TESTNET & full RPC node #3101