DevOps as a Service, or DOaaS, it could be sub-set of offerings of IaaS . Like all members of the "as a Service" (aaS) family, DOaaS is based on the concept that the product, Service + Operation in this case, can be provided on demand to the user regardless of geographic or organizational separation of provider and consumer.
Determine CPU, I/O, or memory intensive, and long-running processes.
Observe which processes are responsible for too many cache flushes.
Keep track of idle time spent by the CPU.
Track open TCP connections, and open file descriptors.
Facilitate triggers/hooks (or lambda functions) to execute with ease when some event occurs. For example, add a replica node (with more cores) to the cluster if high CPU usage watermark exceeds, which load balancer could be notified about, instantly. Most of the triggers should perform tasks usually needed during failover, and they should be set up easily with few clicks (rationale alike popular IFTTT service).
Provision solutions for cascade failure, i.e. identifying the suspicious node and securely removing it off from the cluster which could blow the whole cluster.
How do you tackle the problem of service/node discovery in the cluster if one goes off or one being added?
I like the rationale behind this strategy. I would like to add some more features as wishlist:
Provision facility to visualize the real-time load on the nodes in the cluster.
StatsD-Graphite
duo is popular to achieve it.Facilitate triggers/hooks (or lambda functions) to execute with ease when some event occurs. For example, add a replica node (with more cores) to the cluster if high CPU usage watermark exceeds, which load balancer could be notified about, instantly. Most of the triggers should perform tasks usually needed during failover, and they should be set up easily with few clicks (rationale alike popular
IFTTT
service).Provision solutions for cascade failure, i.e. identifying the suspicious node and securely removing it off from the cluster which could blow the whole cluster.
How do you tackle the problem of service/node discovery in the cluster if one goes off or one being added?