Monitoring, Load-visualization, and Triggers

I like the rationale behind this strategy. I would like to add some more features as wishlist:

Provision facility to visualize the real-time load on the nodes in the cluster.

StatsD-Graphite duo is popular to achieve it.
Collect stats such as -- OS stats, hardware stats, memory usage, CPU utilization.
Track which processes making lot of context switches, system calls, traps.
Track packets sent/received, busy ports, traffic rate, bytes in/out.
Collect garbage collection stats.
Determine CPU, I/O, or memory intensive, and long-running processes.
Observe which processes are responsible for too many cache flushes.
Keep track of idle time spent by the CPU.
Track open TCP connections, and open file descriptors.

Facilitate triggers/hooks (or lambda functions) to execute with ease when some event occurs. For example, add a replica node (with more cores) to the cluster if high CPU usage watermark exceeds, which load balancer could be notified about, instantly. Most of the triggers should perform tasks usually needed during failover, and they should be set up easily with few clicks (rationale alike popular IFTTT service).

Provision solutions for cascade failure, i.e. identifying the suspicious node and securely removing it off from the cluster which could blow the whole cluster.

How do you tackle the problem of service/node discovery in the cluster if one goes off or one being added?

vparihar01 / DevOpsAsAService

Monitoring, Load-visualization, and Triggers #3