Testing cluster - Githubissues

divan commented 7 years ago

Preamble

Idea: 51-test-cluster
Title: Testing cluster
Status: Draft
Created: 2017-11-29

Summary

Provision test cluster consisting of Status nodes running the simulation of real user behavior. Setup high-level metrics monitoring and track changes between releases.

Vision

The idea stems from https://github.com/status-im/ideas/issues/22 (tools for diagnosing performance regressions). One of the main challenges with it is to simulate real-world load and currently, we have no way to do this. Analyzing performance on a single device is also prone to inaccurate results due to the high variability of hardware, software running in the background and other conditions. We also have no easy to way to gather metrics we want from devices.

This leads to the idea of provisioning a cluster consisting of nodes (status-go, real devices or both), including boot nodes. Cluster may run on its own test network or on existing test network (Ropsten). Each node in the cluster shall be instrumented and configured for metrics collections. Infrastructure for metrics gathering, storing and display should be set up. rand_graph

Using graph visualization tools (like Graphana) it'd be possible to see statistically sound performance measurements, pinpoint changes to release/version changes and easily identify regressions.

Think about this cluster as a Status network playground, where you can deploy, say, 30% nodes with a new change and easily see the difference in performance metrics against stable version. It also enables further possibilities for data gathering and exploration. Example: by collecting stats about each incoming and outgoing whisper message, we can visualize Whisper protocol behavior which may help to build intuition around it and help to debug/develop future versions of the protocol.

Swarm Participants

Lead contributor: @divan
- Testing & Evaluation:
- Contributor:
- Contributor:
- UX (if relevant):

Requirements

There is a cluster provisioned and deployed using available cloud providers
Deployment scripts and tooling are well documented, codified (i.e. terraform/packer) and understood
Core team members have clear understanding how to deploy Status build to the cluster
Status code is instrumented with metrics and Core team have clear understanding how to add new metrics if needed
Metrics gathering infrastructure is designed and deployed (this includes metrics collection, storage and visualizing software)
Real-world usage simulation is designed and implemented, so each deployed node automatically starts "behaving as a user"

Goals & Implementation Plan

Implementation of this idea has three roughly independent parts that need to be researched, designed and implemented:

cluster infrastructure
metrics part (changes to code and infrastructure)
usage simulation

Cluster infrastructure

This part should start by evaluating the viable size of the cluster we want to have: 50 nodes, 100, 1000, dynamic? Then, which nodes cluster should consist of: only status-go nodes, real devices/simulators or both.

Then find the best software solution for that. This part requires an understanding of the ethereum discovery process. Solutions like Docker Swarm might be enough, but it might be possible that we'll want to simulate real network topology, for which we'll need to use specialized simulators like Mininet. Each node should probably be isolated using containers, but any isolation alternatives can be evaluated of course. That's unlikely that cluster can run on the modern laptop (it would be awesome though), so the cloud provider should be chosen, whichever easier to work with (AWS/GCP/DO, I guess).

Once the vision of how the cluster should look like is clear, provisioning scripts and tools should be implemented and designed to be developer friendly, with a high level of automatization (again, terraform is probably the right way to go). Ideally, we should be able to deploy as many identical clusters as we wish without any hassle.

In case if cluster runs on the private network, it should setup own bootnodes as well.

Metrics

As the main purpose of having test cluster is to gather data and observe behavior at scale, the code needs to be instrumented to provide those metrics to the metrics collection infrastructure. Here we have two connected parts: code instrumentation and setting up metrics collection infrastructure. Ideally.

Metrics instrumentation

Developers might want to add custom metrics apart from obvious things to measure — CPU, memory, I/O stats, etc. Go code would probably want to report number of goroutines, garbage collection stats, etc, plus many custom things like the number of Jail cells, incoming and outgoing RPC requests, etc.

The task here is to make code instrumentation to be as friendly to the developer as possible: it should be easy to add and test new metrics with the minimal learning curve. One of the examples of such easy approach is expvar Go stdlib package, which might work perfectly for the pull model of metrics. Which model to use (pull/push) is a subject to investigate.

Finally, the instrumented code should not go into production. It can be implemented via build tags, or simply by mocking it with dummy NooP metrics sender, which doesn't change resulting binary code.

Metrics infrastructure

This infrastructure should be a part of cluster deployment, so if there are many clusters, each has its own metrics dashboard and tooling. Essentially it involves metrics collection code, storage (for some period of time) and visualization software. There currently a lot of software to choose from, including Prometheus and Graphana, so the best tools should be chosen here.

Then deployment scripts and code should be implemented. Ideally, it should be (almost) zero configuration for nodes.

Usage simulation

This part consists in developing ways of automating user interaction with Status node and researching of real-world user behavior. First one is more or less simple — provide API to talk to the node, and make it do stuff (send messages, create chats, use dApps, send money, etc). The second one is trickier because effectively it's about simulating the whole economy and humans behavior — simulation code should decide who sends the message to whom, how often, how much money to send, how to use dApps, etc.

Obviously, perfect real-world simulation is unlikely to be achieved, we just need the simulation to have two properties:

sequences of user actions generated should be close to the real world usage (we might grab them from TestFairy sessions)
probability distributions should be close to what we think is a real-world case (Poisson distribution for independent actions, gather knowledge about real-world usage and improve simulation as much as possible)

Each simulation agent could be independent or controlled by a single node in cluster — subject to investigation, which would be a better approach.

Minimum Viable Product

MVP should consist of:

a simple cluster of <10 nodes, possibly running on the laptop
three metrics reported and visualized in the dashboard: CPU, memory, Goroutines (i.e. one custom metric)
naive user behavior simulation — login, add a new contact, send some message, receive messages, sleep, repeat. Goal Date: 2017-12-25 (Xmas!) Description: MVP

Iteration N.1

increase number of nodes to 100
setup a cloud infrastructure if needed to support this number
setup a metrics infrastructure for the cluster in the cloud Goal Date: 2018-01-20 (adjusted for winter holidays in mind) Description: Move to the cloud

Iteration N.2

evaluate and develop user simulation strategy and tooling
work on developers experience — libs & docs
add more metrics and better visualizations (including net topology, for example) Goal Date: 2018-02-10 Description: Make it candy

Supporting Role Communication

Copyright

Copyright and related rights waived via CC0.

themue commented 7 years ago

Good approach for near-reality approaches. Would start with outlining the requirements for metrics instrumentation and infrastructure. This way we can focus on the right set for reaching the MVP. Scaling later shouldn't be a problem.

themue commented 7 years ago

Puppeth looks like a nice toolset to quickly set up testing clusters. Currently digging deeper into it but there's a lot of inspiration for the tasks here.

divan commented 7 years ago

One of the questions to explore is how much nodes we want to be in a cluster, so it satisfies following properties:

it's large enough to simulate real-world cases (3 nodes won't do)
deployment and maintaining costs are justified (10000 nodes are too complicated even to update)

At least, an order of magnitude, 10, 100, 1000, more?

I thought if a number less than 50 is sufficient, it would be possible to buy that number of old Android phones (I saw Galaxy S4 for less then < 60$ on Amazon) and setup cluster from real devices running Status and collect metrics directly from them.

themue commented 7 years ago

Seen this kind of racks for app ratings by bots. :grin: Neat idea as it is more realistic than running on AWS. Sadly not yet have any clue on how complex this would be to control and how to do metrics there. Instead worked several years with clouds.

antdanchenko commented 7 years ago

User behavior simulation:

Automated tests are already created and can be used for 'user behavior simulation' (recover, create user, add a new contact, send/receive messages, sleep, repeat, send/receive transaction etc.)
We can run tests in parallel in order to simulate any amount of users.

Infrastructure for users behavior simulation:

BitBar could be the option for running user flows:
- Significant amount of real devices in cloud
- Provides metrics of CPU and RAM of each device per each session
- Provides API which can be used for gathering CPU/RAM for own average reports
- Supports Appium server (automated tests developed to run against the server)

Minimum Viable Product:

Add measurement of current CPU/RAM consumption on single device for 'Create User' and 'Add User' flows using existing automated tests.
Add automated ability to get and compare average CPU/RAM results for the user flows for different product builds
Add Jenkins job to compare results of build X and build Y

Requirements:

Solo Plan is needed in order to start MVP (@naghdy, please approve)

adambabik commented 6 years ago

I am ready to plage 40h/week for this idea.

adambabik commented 6 years ago

Work is tracked in this project: https://github.com/orgs/status-im/projects/6

oskarth commented 6 years ago

Seems like work has already started on this, great! It looks like it is still in draft mode. It would be good if we could keep this issue up to date in order to achieve: https://wiki.status.im/Status_Organisational_Design

Some questions:

Who is the tester and evaluator?
Who is the other contributor?
Any other roles needed for the swarm?
When is the MVP due (says Christmas but no update here, and still draft, so assuming this didn't happen)?

@adambabik @divan

naghdy commented 6 years ago

Is this swarm still active? Does it have a specific goal to ship something? Feel free to re-open I am mistakenly closing it.

status-im / swarms

Testing cluster #51

Preamble

Summary

Vision

Swarm Participants

Requirements

Goals & Implementation Plan

Cluster infrastructure

Metrics

Metrics instrumentation

Metrics infrastructure

Usage simulation

Minimum Viable Product

Iteration N.1

Iteration N.2

Supporting Role Communication

Copyright

User behavior simulation:

Infrastructure for users behavior simulation:

Minimum Viable Product:

Requirements: