Week3 - Automated Canary Analysis at Netflix with Kayenta

Words

in partnership with Google
Kayenta leverages lessons learned over the years of delivering rapid and reliable changes into production at Netflix
as it reduces the risk from making changes
by providing engineers with a high degree of trust in their deployments
is a technique to reduce the risk from deploying
is deployed to a small subset of users
alongside the stable running version
such that a portion of incoming requests are diverted to the canary
uncover any problems
is assessed by comparing key metrics
that describe the behavior of
in an effort to minimize the impact of unexpected behavior
we augment the canary release process
This cluster runs the proposed changes of code
How this delineation of traffic routing occurs depends on the type of traffic
could produce unreliable results
are free of any effects caused by
prompt manual intervention to proceed
is determined to be safe
was initially a manual process
how closely the metrics matched
Needless to say
several hours spent staring at graphs and combing through logs
This made it difficult to deploy
Our first attempt at automating
We next attempted to generalize
is based on lessons we have learned over the years of
assessing the risk of a canary release
This is comprised of two primary stages
These metrics are typically stored in
which identify if the data was collected from the canary or the baseline.
These metrics are combined with a scope
some metrics may come from one source while other metrics can come from another
a decision as to whether the canary passed or failed
Towards this end
there are four main steps as part of
The goal of data validation is to ensure that
and the analysis moves onto the next metric
This entails handling missing values
for each metric indicating if there is a significant difference
is classified as either “Pass”, “High”, or “Low”
A classification of “High” indicates that
The primary metric comparison algorithm in Kayenta
confidence intervals
After each metric has been classified a final score is computed
how similar the canary is to the baseline
as the ratio of metrics classified as “Pass” out of the total number of metrics
we bias towards techniques which are simple to understand
which integrates the canary score into the Spinnaker
Users can drill down into the details
view them in various ways
The report gives a breakdown of the results
users can get a view
Having detailed insight into why a canary release failed is crucial
judges to be run on previously collected data
Kayenta is designed to allow
to be plugged in as needed.
Kayenta is able to run
is easier for application owners to configure
We have removed much of the complexity of
our legacy system had many special flags which were combined in various ways
would later be unclear as to the intent of using them
is more focused on semantic meaning of a metric
will extend this further to set appropriate defaults for metrics
which amounts to an average of 200 judgments per day
Over the next few months
we plan on migrating

kymr / daily-study

Week3 - Automated Canary Analysis at Netflix with Kayenta #10

Title

Summary

Reference

Words