iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.38k stars 3.86k forks source link

[RFC] Convert trace data to CTF #1049

Open tuxology opened 7 years ago

tuxology commented 7 years ago

A possible extension to BCC can be to convert trace data coming from maps and write it as CTF data. The CTF format is a compact binary trace format which is used by tools such as LTTng and can allow traces to be viewed in a very nice fashion in Trace Compass. This would allow efficient trace storage for post-mortem analysis. Also, it would allow traces to be viewed graphically. Trace Compass has very interesting features such as flame-charts, and critical flow view, resource view (and many more).

I have done a quick test which can write CTF traces using babeltrace library (2.0.0-pre) Here are the results :

[suchakra@moria tracing]$ sudo ./open2ctf.py 
trace path: /tmp/tmp26ga140x
^C[suchakra@moria tracing]$ babeltrace /tmp/tmp26ga140x
[00:24:50.796162597] (+?.?????????) 0 do_sys_open: { }, { pid = 909, comm = "iio-sensor-prox", filename = "/dev/iio:device2" }
[00:24:50.796399646] (+0.000237049) 0 do_sys_open: { }, { pid = 909, comm = "iio-sensor-prox", filename = "/dev/iio:device4" }
[00:24:50.801229686] (+0.004830040) 0 do_sys_open: { }, { pid = 1206, comm = "gnome-shell", filename = "/proc/self/stat" }
[00:24:50.801430938] (+0.000201252) 0 do_sys_open: { }, { pid = 1734, comm = "gnome-shell", filename = "/proc/self/stat" }
[00:24:50.807656245] (+0.006225307) 0 do_sys_open: { }, { pid = 18399, comm = "gnome-settings-", filename = "/proc/self/fd" }
..
..

Here, we collect trace data from do_sys_open and save it in CTF, then open the trace using babeltrace (which is a trace reader as well). This trace also works with Trace Compass. As we increase number of relevant trace events collected (sched_* etc.), Trace Compass would eventually be able to show something like this. This in PoC stage for now - the clocks and timings have to be improved. The deltas are ok, but wall clock time is not correct. We'll do a helper function in BCC itself that can help write these traces. Expect a PR soon :) If you have suggestions, leave them here!

tuxology commented 7 years ago

So here is a small update, I did a small example of adding some helper functions in BCC that make CTF writing easy. Here it is : https://github.com/tuxology/bcc/blob/babeltrace-ctf/examples/tracing/clone2ctf.py

It requires babeltrace python bindings and Python 3. There are still some bugs which I am ironing out but it is generally works. The output should look something like the following:

[suchakra@moria /tmp]$ sudo chown -R suchakra:suchakra clonetrace
[suchakra@moria /tmp]$ babeltrace clonetrace                     
[09:42:03.562878549] (+?.?????????) 0 sys_clone: { }, { pid = 1489, comm = "gnome-settings-" }
[09:42:03.592275808] (+0.029397259) 0 sys_clone: { }, { pid = 1489, comm = "gnome-settings-" }
[09:42:03.605857938] (+0.013582130) 0 sys_clone: { }, { pid = 24313, comm = "pkexec" }
[09:42:03.606051997] (+0.000194059) 0 sys_clone: { }, { pid = 24313, comm = "pkexec" }
[09:42:03.617910624] (+0.011858627) 0 sys_clone: { }, { pid = 950, comm = "polkitd" }
[09:42:03.622204099] (+0.004293475) 0 sys_clone: { }, { pid = 24316, comm = "pkla-check-auth" }
[09:42:04.264687359] (+0.642483260) 0 sys_clone: { }, { pid = 1489, comm = "gnome-settings-" }
[09:42:04.294960533] (+0.030273174) 0 sys_clone: { }, { pid = 1489, comm = "gnome-settings-" }
[09:42:04.323135708] (+0.028175175) 0 sys_clone: { }, { pid = 24319, comm = "pkexec" }
[09:42:04.323213269] (+0.000077561) 0 sys_clone: { }, { pid = 24319, comm = "pkexec" }
[09:42:04.339736280] (+0.016523011) 0 sys_clone: { }, { pid = 950, comm = "polkitd" }
[09:42:04.345809831] (+0.006073551) 0 sys_clone: { }, { pid = 24322, comm = "pkla-check-auth" }

I know its too early and it looks ugly, but here is the TraceCompass view. The bcc_<num> identifies which CPU it is. We need to figure out how to show it in proper tab :smile:

TC

I added another example open2ctf.py, which gives similar output.

4ast commented 7 years ago

looks cool. how stable are babeltrace python bindings? in general we're trying hard to avoid external dependencies in bcc though.

tuxology commented 7 years ago

@4ast This is based on babeltrace 2.0.0-pre and the APIs may change a bit before final release. There is babeltrace 1.5 which is stable (https://github.com/efficios/babeltrace/tree/stable-1.5/bindings/python). I can use 1.5 if this sounds interesting. Babeltrace would be maintained for long IMO. I propose that we create a new ctf branch and not push this in master right away. Once it becomes stable, we can move it to master

In long term, for more stability and performance, we may build libbcc.so conditionally with libbabeltrace. Then we can directly use the APIs of babeltrace in C. For python or any other language, we can then create wrappers. This would create one of the first ways to store BPF trace data in a pretty standardized format used by other tracers and trace viewers.

Another goal is to create data driven trace analyses specially tuned towards specific BPF trace events (or a collection of events for example sched_*). This is still under discussion.