nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.65k stars 29.08k forks source link

tracing working group #671

Closed Qard closed 9 years ago

Qard commented 9 years ago

@bnoordhuis @sam-github @othiym23 @Wraithan @groundwater @brycebaril @trevnorris

We should revisit the tracing situation. Perhaps a working group is in order? Some of you have moved on from APM since our last discussion, some were not present, but your input is valuable.

thlorenz commented 9 years ago

I mentioned the problem about MachO symbols generated by v8. All that has gone into that is here

mikeal commented 9 years ago

create a tracing-wg repo.

meeting notes: https://github.com/iojs/tracing-wg/blob/master/wg-meetings/2015-02-05.md

we should propose the next one there.

natduca commented 9 years ago

I thought I'd provide some notes on the direction we're taking in chrome and indirectly v8 around tracing & stack sampling. Since we both intersect at v8 and do run into some of the same "guhh i need a tracing api but its gotta be low overhead," I hope there's some useful context!

All our performance tooling in chrome is moving to be trace-based. Just recently (chrome 42 maybe?) we moved devtools to use tracing for its timeline and flamechart-style js profiling. And, we've for a long time had chrome://tracing built around trace data. The rough philosophy we use for tracing is described in https://docs.google.com/a/chromium.org/document/d/1l0N1B4L4D94andL1BY39Rs_yXU8ktJBrKbt9EvOb-Oc/edit

When we chromies say tracing, we mean a stream of events with timestamps. Beyond that, we've tried to embrace the notion that there is no One True Format To Rule Them All, but rather that the common clock ties everything together. As long as you can get a stream of traces from a few places, and have them share clocks or have clock alignment records, then the rest is a matter of postprocessing. Our UI (github.com/google/trace-viewer) for instance for chrome://tracing just merges kernel traces and userland traces after the fact.

This is what allows us to have traces that contain OS/platform level data from dtrace/lttng/perf and also userland event streams. For instance, if you're on a chromebook, you can get a trace that has our userland trace data but also stuff from the kernel. Or, on android, similarly, using https://github.com/johnmccutchan/adb_trace. In both cases, we leave the OS-specific tracing to the OS, and then let the UI or tools then figure out how to pull our event stream and the OS' stream themselves.

So, for most "day to day" tracing discussions, we solve all our stuff with c++ instrumentation. (Note, we consider sampling (e.g. interrupting every 1ms to get a stack trace) to be a userland activity.)

Our userland tracing api is called "trace_event". This is a two layer thing, and actually resembles some of the ideas thrown around on the hangout I think: we have a header file with basic tracing macros, then this header can be configured to call into an implementation via a narrow set of tracing control apis. https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/&q=trace_event&sq=package:chromium&type=cs ... though there are a few hundred trace apis that our code uses, it boils down to a half dozen c++ apis that one must implement in order to receive and process the events.

This api is low enough overhead that we have thousands of trace events per second compiled into our shipping chrome builds, in our hottest chrome codepaths. We're pretty happy with the overhead: when its off, the overhead is just a check of a global char* for truthiness, as in, a few instructions [depending on architecture etc]. When its on, each tracepoint takes about 20-50us to buffer on a lowend android phone, which has been enough to keep us happy. On real machines with power, I'm pretty sure the overhead is way way lower. Shrug.

In chrome, the actual trace api implementation is the trace_event_impl.cc, but in other libraries like Blink, Skia or V8, we copy the trace_event.h header file over and then apply a small tweak to it that causes it to trampoline back over to chrome via a local singleton. Then when we boot up one of those libraries in chrome, we set that singleton to point at chrome's tracing implementation. A key property of our architecture is that when a tracepoint is off, you don't have to hop from v8 to chrome in order to discover the trace isn't needed. I suspect this could let us trace things that are internal to v8 like promises without as much scary overhead discussions as we'd have if we were calling through an isolate-specific callback.

This is all a bit wordy, I realize. It may be easier to read in code how we're hooking v8 up to chrome tracing... We are literally in the middle of connecting v8 to this tracing system. There are codereviews linked from that doc for context.

https://docs.google.com/a/chromium.org/document/d/1_4LAnInOB8tM_DLjptWiszRwa4qwiSsDzMkO4tU-Qes/edit#

Importantly, this isn't just being done for us to get v8 issue trace events for when its doing internal work, say compiling or doing a majorgc. We're also going to send sampling data stream plus code creation/relocation events through the trace interface, so that when we start tracing, we can get both timestampped "begin/end" events AND sampling events.

One final technical note: I know one of the hot topics in all this is encoding the relationships between async information. In chrome's trace ecosystem, we've tried to solve this with flow events, which represent "arrows" between individual synchronous events on a thread, represented with regular begin/end events. More info on our trace data model --- which our trace_event.h file exists to generate --- is here: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit

Anyway, thats a lotta words. Hope it makes sense! This was all designed with client use cases heavily in mind, so I totally buy that this is all whacko and crazy when viewed through a server-side lens. This having been said, feel free to shout though with questions, I'm always around to chat.

thlorenz commented 9 years ago

Just wanted to point out that some of us interested in Profiling/Debugging are hanging out in IRC #ngin8.

mikeal commented 9 years ago

@natduca we would love to have your input in the Tracing WG if you're interested :)

sam-github commented 9 years ago

@natduca, re:

in other libraries like Blink, Skia or V8, we copy the trace_event.h header file over and then apply a small tweak to it that causes it to trampoline back over to chrome via a local singleton.

I can't find any sign of trace_event.h in https://chromium.googlesource.com/v8/v8 ... am I looking in the wrong place, or are you describing speculative future work?

paulirish commented 9 years ago

https://code.google.com/p/chromium/codesearch#search/&q=file:trace_event.h&sq=package:chromium&type=cs

sam-github commented 9 years ago

Found it linked from https://codereview.chromium.org/827993003

natduca commented 9 years ago

Yeah, this patch is stalled because the author has been busy. But its still coming. :)

sam-github commented 9 years ago

@natduca can you comment on the differences and similarities between the Timeline view in Dev Tools, and the chrome://tracing view... do they both display trace_event data, or does only chrome:tracing show trace event data? Its a bit confusing, they seem very similar, perhaps one is destined to replace the other?

thlorenz commented 9 years ago

@sam-github Timeline in DevTools shows sampled data .cpuprofile, vs. chrome://tracing shows traced (structural) data. traceviewify actually emulates function entries/exits to convert .cpuprofiles to trace-viewer format.

natduca commented 9 years ago

@sam-github @thlorenz as in all big software, there's a lot of nuance because we're in between the old thing that is deprecated and the new thing that does't work quite right. :)

Timeline view in devtools is almost completely based on chrome://tracing data these days. The corner cases are cpuprofile but we're working on moving that over completely. Same for network panel. The future we're working toward is that all performance data you see in devtools came through the tracing data stream.

But, the UI are different: devtools ui is focused on ease of use for a web developer, whereas chrome://tracing is just the raw "good enough for chrome hackers" view. Devtools' timeline won't move to that chrome://tracing ui, though it may pick up ideas from that UI, or vice versa.

paulirish commented 9 years ago

Pretty close, thorsten.

differences and similarities between the Timeline view in Dev Tools, and the chrome://tracing view

There's at least there UIs here:

  1. chrome://tracing aka trace-viewer - screenshot
  2. devtools timeline (flame chart) - screenshot
  3. devtools sampling profiler & flame chart - screenshot

Once the above commit lands, all of these will be powered by the trace event data stream (v8 sampling and network being the remaining parts). And after that all 3 UIs will allow import/export of .trace files.