Open lukego opened 8 years ago
The backend is up and running now! Hydra runs test campaigns automatically across a cluster of machines, archives the logs (including timeline files), and publishes an Rmarkdown report to highlight anomalies.
This hopefully provides the data feed that we need for developing and using the Snabb Studio front-end i.e. to identify the interesting results, download their timeline files, and use the GUI to understand what is going on.
Here is an overview of how everything works and how you can access it. I will provide both links and, mostly for convenience, screenshots. I will link to recent runs and code on my own development branches.
bench.nix is a Nix script that runs the benchmarks and converts their output into CSV. This is built on shell fragments to call our existing benchmarks.
[report.Rmd]() is the Rmarkdown document that produces the report. This is an early draft and can be extended, tweaked, and pruned over time. Just now it seems reasonable to keep adding more graphs and then later remove the ones that are not useful in practice. I am trying to start the report with the most intuitive graphs and then move on to more exotic ones. (We also need an Rmarkdown document that presents a comparison between two or more Snabb branches.)
I have mostly learned R from two O'Reilly books: R Cookbook and R Graphics Cookbook. This is where I picked up using the "ggplot2" library for most graphs.
I have defined a Hydra Jobset to run a benchmark campaign using software versions from specific git repos/branches:
This could be enabled to run automatically when any of the branches is updated. For the moment that is disabled and I run it manually.
The Jobset Evaluations lists the results of each execution of this jobset.
Each execution only rebuilds what is needed: for example if only the Rmarkdown document has been changed then the previous benchmark results will be reused, but if the Snabb code was changed then all the benchmarks would be rerun. This is the normal behavior for Nix i.e. it looks at the SHA256 hash of what it needs to run and checks if it has already done that in the past.
Clicking on one of these Evaluations takes me to a page that lists all of the Jobs that Hydra has built for this Jobset.
This example has 122 jobs: the CSV file, the R report, and each of the 120 individual benchmarks (20 runs each of 6 different benchmarks.)
Clicking through to the benchmark-report job page leads me to benchmark.html that is the actual report. This includes a series of graphs, the first of which is a line graph.
Here we can see that three results are interestingly slower than the others: basic1
test number 6 and iperf1500
tests number 8 and 15. I can use these numbers to identify which Hydra jobs contain the relevant logs, for example benchmark basic1
run 6 is job snabb-basic1-100e6-num-6.
Here I can click the links to download either log.txt
, the text output from running the test, or state.tar.xz
, a tarball that includes the timeline
file for the Snabb process(es) that were executed.
These timeline files can then be loaded into the GUI to see what is going on. Getting this working smoothly is the next step.
Anticipating the questions we will want to answer with the GUI...
... but perhaps the visualizations will speak for themselves and we will not have to come into the analysis with such specific questions :).
HUGE THANKS @domenkozar for setting up this backend for me to tweak!
This issue is to reintroduce Timeline (#849) and keep track of some related work.
What is the timeline?
The timeline is a log file that makes system performance more transparent.
Each log entry has a timestamp with cycle (~nanosecond) precision, a short message, a long explanation, some optional counter values, and other relevant details.
The log level is dynamically adjusted for each iteration of the event loop (breath): most breaths are not logged, many are logged at a high level, and some are logged with fine detail. The log therefore provides samples of operation in units of whole breaths.
The timeline will typically contain one million events recorded over the past few minutes.
Who is it for?
The timeline is intended as a common reference for many different people:
Having a common tool will hopefully make it easier for these people to communicate with each other.
I think of the timeline like the flight recorder ("black box") in an airplane. It is a buffer containing data about recent operation and it is robust enough to survive a crash. If a system crashes or behaves strangely during a test run then the first step for every interested party would be to review the timeline and compare their observations.
Example
Here is an excerpt from the timeline file showing a breath of the
snabbmark basic1
synthetic benchmark:The first message shows the breath starting along with the total number of packets/bytes/bits that have been processed. The last message shows the breath ending along with the number of packets processed and their average size. The messages in between show more detail: in this each callback made to an app in the engine configuration. The cycle count is shown as a delta compared with the previous log entry, which roughly estimates how long each event took to complete. The currently scheduled CPU core and NUMA node are also included as additional details.
See also longer example in which events are shown from newest to oldest.
Status
The timeline is currently living on a couple of branches:
next action: @lukego needs to submit a version for upstreaming that is unlikely to cause problems for people. This requires ensuring the overhead is negligible and perhaps changing default settings e.g. smaller log file or even logging disabled by default.
Grand vision and parallel work
The timeline is one part of a grand vision for making it easy and productive to optimize and troubleshoot Snabb applications.
More related work:
More indirectly, I hope that timeline will provide the framework for writing fast applications. How do you optimize your application? Break it into pieces that each run for around a thousand CPU cycles and can easily be measured with timeline. This would have the double benefit of making system performance transparent and also dividing the work into the small loops that CPUs and tracing JITs love. We shall see!