tracing working group - Githubissues

Qard commented 9 years ago

@bnoordhuis @sam-github @othiym23 @Wraithan @groundwater @brycebaril @trevnorris

We should revisit the tracing situation. Perhaps a working group is in order? Some of you have moved on from APM since our last discussion, some were not present, but your input is valuable.

mikeal commented 9 years ago

quick question, is tracing part of the solution to a larger problem or is it totally separate?

could tracing and dtrace support and systemtap support be classified as part of a larger effort the WG would be solving?

mikeal commented 9 years ago

also @thlorenz who is working on some related stuff.

Qard commented 9 years ago

I think they are all related. At the time of our previous discussion, AsyncListener was being removed from core. Now we have async_wrap, so I think we should reconvene to discuss what we can do with that to make tracing more pleasant. DTrace and SystemTap are just other destinations for trace data, so I think that could be part of the discussion.

The topic of the working group could potentially be a bit broader, reaching into other debugging issues, like what to do with domains.

trevnorris commented 9 years ago

@Qard IMO the working group should discuss what tracing functionality is more generally wanted, and then the minimalist hooks to allow that functionality to live in user-land should be implemented here. There is a never ending list of features people want, and adding/maintaining all those is the wrong decision. Extending the hooks that currently live in async_wrap sounds great. I'd love to get more feedback on what more is needed.

mikeal commented 9 years ago

I'm working on outreach right now to get wide input in to the roadmap. @trevnorris can you think of a good question to ask individuals and companies that could inform what they need from tracings? Maybe something like "what do you wish you knew about a running Node application that you don't know now?"

Qard commented 9 years ago

In our previous meeting, we came to a rough consensus on what our problems were as APM providers. Knowing more about what the customers want to see would certainly be valuable too.

I agree that a lot of the applications of this would likely be userland stuff, but we can figure out what that is when we get to it.

trevnorris commented 9 years ago

@mikeal Would it be possible for devs to provide pseudo examples of what they want? Meaning, they provide some example code and can explain what it is they want traced. Seeing usage cases, imho, would be the most beneficial.

sam-github commented 9 years ago

@Qard do you have a link to the write up for that last meeting? I'm having trouble finding it, and its pretty relevant here.

Qard commented 9 years ago

@sam-github Yep, here it is.

https://gist.github.com/groundwater/942dad5c0c4cfae21af9

These are the trace meeting notes compiled by @groundwater after our previous meeting, for anyone else that wants to have a look.

brycebaril commented 9 years ago

@Qard yes! Definitely still interested

hayes commented 9 years ago

I am definitely interested as well

othiym23 commented 9 years ago

I am most definitely interested, both in following up on the conversation we started last year and in whatever broader working-group discussion that might result from Mikeal's canvassing.

thlorenz commented 9 years ago

I feel there are two goals here which are solved differently:

1) improve tracing of the node process from within which in most cases requires either some sort of hooks, monkey patching, addons or changes to core
2) simplify integration of system tool tracing which can be solved in user land in most cases

I created an issue to collect info about existing user land tools, some of which are addons that interface with v8 in order to pull out profiling info.

I'm mostly focusing on 2) ATM, which can be converters, parsers to consume output of perf, dtrace, system tap, etc. and plugins into tools like debuggers,

At the moment we have the following (some rather new): I'm most likely missing lots, so please add some you know of

v8-lldb and lldb-jbt lldb plugins which add JS symbols info to stack traces
resolve-jit-symbols to add JS symbols info to any trace via info emitted by v8 when using the --perf-basic-prof flag
flamegraph perl scripts and the in browser flamegraph app to visualize aggregate info about traces
to .cpuprofile converter from perf-script or dtrace results which allows analyzing these traces via Chrome DevTools the same way we can already for output of v8-profiler

For those interested, some of us are gathering in #ngin8 on IRC to discuss some of these efforts. @paulirish and I were talking there about _sunburst_s - (think flamegraphs+) - and how to get them integrated with system tools.

So a wide scope to cover here, not sure if it makes sense to have this all in one group or if we should split it into one group for each section as outlined in 1) and 2).

Qard commented 9 years ago

I see the working group mostly focusing on goal 1, but goal 2 ties into it in many ways.

Being able to smoothly correlate trace data across JS and C++ boundaries is one thing that comes to mind. Visibility beyond the nebulous "it went into libuv somewhere, it'll probably call back at some point" would be great.

Buffer usage in native modules also seems like something that'd be good to get some visibility into.

I'm sure there's plenty of areas you all can think of where integrating with the native side could provide some very valuable data.

rvagg commented 9 years ago

I'm very interested in helping this WG get off the ground. This topic is at the core of the "completeness" story for Node IMO. Whenever we interact with companies shifting from more mature platforms, the lack of insight into what their programs are doing is one of the key areas lacking from Node. It could be part of a larger "debugging" topic, but given that this particular group is so focused on tracing and have all been working on this area I think a tracing-focused WG would be a good start.

How about y'all make sure that you've collected all of the relevant people with an interest in this area and a history of actually tackling this problem and then we'll try and find a time for you to have a kick-off Hangout to discuss how best to proceed.

Qard commented 9 years ago

I would like to see the greater debugging issue getting tackled, but I think we should focus on one thing at a time. Given most of us have a background in APM, I think tracing is the obvious first topic.

Also, do we have any connections to V8 core people working on the profiling tools? It would be good to coordinate with them. Perhaps @Domenic can help with that?

bnoordhuis commented 9 years ago

I believe most of the debugger and profiler work was done by non-Googlers. If you restrict it to just people from the V8 team, Yang is probably the most active in that area.

Apropos the tracing WG, sign me on. StrongLoop would be most interested in (async) tracing and getting more metrics out of io.js and libuv (and V8, but that's a separate story.)

othiym23 commented 9 years ago

I think a WG that doesn't have a strong...StrongLoop presence would be incomplete, because @sam-github, @rmg, and @piscisaureus have all done significant work in this area, in addition to you, @bnoordhuis. I also think focusing on APM / production performance analysis is a sensible move. "Debugging" is a huge and broad area.

groundwater commented 9 years ago

The consensus from our meeting was roughly "we want to collect arbitrary data, at arbitrary places" which is probably too vague to anyone who did not attend. I would suggest we come up with a dozen well-defined user stories (sorry for the product manager speak) before we all jump into solution land.

At a high level, I think the stories should cover at least the following

one or more APM use cases
helping a poor user who uses console.log as their debugging tool
getting detailed low-level GC/mem/libuv info
stitching together async transactions (i.e. continuation-local-storage)

Other than that, we probably want to discuss constraints the solution must meet

does not negatively impact performance when not in use
impacts performance in a production-safe way when in use
is the minimum necessary changes to implement solutions in npm-land

Qard commented 9 years ago

So we have clear interest in being involved in a WG expressed by myself, along with:

@brycebaril @hayes @othiym23 @bnoordhuis @thlorenz

If anyone else commenting here wants in, please say so.

Thanks @rvagg for the offer to help get this going. We should figure out how this fits into the Hangouts calendar and get a doodle started.

thlorenz commented 9 years ago

@Qard please add me to this list as I'm also interested in being involved.

othiym23 commented 9 years ago

This may be of interest to @AndreasMadsen as well, given the existence of @AndreasMadsen/trace and the work he's doing with async_wrap.

rmg commented 9 years ago

@othiym23 thanks for the mention.

+1 on this being part of the completeness story, @rvagg. I think it would be beneficial to pretty much everyone in node land if we could raise the bar for the level of VM and libuv inspection available without having to rely on native addons.

domenic commented 9 years ago

Also, do we have any connections to V8 core people working on the profiling tools? It would be good to coordinate with them. Perhaps @Domenic can help with that?

If you guys have specific things you're interested in I can try to reach out. @paulirish might also be a good contact as he's working on dev tools in specific.

One thing I'd personally be interested in seeing out of this group is some idea of what hooks into the VM or event loop or runtime environment are necessary for this kind of work. Then maybe we can standardize those and put them in V8 and in browsers after io.js proves them out in the real world. Cf. https://github.com/node-forward/discussions/issues/28. So basically when defining the solution and "polyfilling" it in io.js, give some thought to how generalizable it would be.

dberesford commented 9 years ago

We in @nearForm have been dabbling quite a lot with LTTNG and loving it. We have a fork of io.js on the go which adds LTTNG tracepoints in a similar manner to dtrace and ETW: https://github.com/nearform/io.js/tree/tracing. Can we be included in this working group?

bnoordhuis commented 9 years ago

@dberesford I don't see why not. I suspect the focus will be more on dynamic tracepoints rather than static ones, though. If you want to, the LTTNG support can (with a bit of rework) land upstream.

dberesford commented 9 years ago

@bnoordhuis that would be great, we'll get a PR ready for review

mikeal commented 9 years ago

Preliminary feedback from companies I've conducted in the past surfaced a strong need for "Linux debugging/tracing" so I'll volunteer to do any grunt work here to get the group off the ground (schedule first meeting, arrange agenda, write the charter) but in the first meeting I'll probably call out someone to take on the role of facilitator moving forward.

Here's a doodle for the first meeting, scheduled for this Wednesday/Thursday.

http://doodle.com/x53auvrtffeia2in

For building the initial agenda, please propose topics here and I'll put together a list.

GlenTiki commented 9 years ago

I'm working on the Lttng stuff in @nearForm, and I'd love to get involved with the group. Lttng is the best way we can see at approaching the Linux tracing. :) @mikeal

sam-github commented 9 years ago

@kraman Take note.

Qard commented 9 years ago

Looks like 1PM PST Wednesday or 11AM PST Thursday are our best times so far. I'm partial to Wednesday, anyone else have a preference?

As for topics, I think an obvious starting point is figuring out how to expand on the recent async_wrap work. I'd also like to touch on native visibility, broken down into what we can learn from libuv, v8 and io.js itself, especially from third-party native modules.

mikeal commented 9 years ago

11AM PST Thursday it is!

AndreasMadsen commented 9 years ago

Will it be possible for non working group members to follow this meeting live? If so, how?

trevnorris commented 9 years ago

I would like everyone to start documenting what type of information they want access to. None of the features will be implemented directly. Instead we should do a survey of all the information, find common areas, then implement the hooks to allow users to access that data.

Qard commented 9 years ago

Agreed. My thought is that we should have some "official" development of userland options, but the stuff in core should just be simple and generic.

bnoordhuis commented 9 years ago

I won't be able to join but can I persuade someone to take notes and post them here afterwards?

Qard commented 9 years ago

Are we doing hangouts on air? You could listen to that after, if we are.

mikeal commented 9 years ago

I'll get the whole hangout thing setup later today, it'll definitely be recorded. Can someone else volunteer to take notes so that I can focus on just pushing through the Agenda.

mikeal commented 9 years ago

@thlorenz it's happening tomorrow :)

thlorenz commented 9 years ago

@mikeal saw that and immediately deleted the question, but you're too quick :)

mikeal commented 9 years ago

Hangout is scheduled:

Event Link: https://plus.google.com/events/c45orf7dfm3bem19ogg6cp9icv8

Youtube Link to Watch: http://www.youtube.com/watch?v=Oar2yB5SPtA

Participation Link: https://plus.google.com/hangouts/_/hoaevent/AP36tYesuuHp1yLKGJCCmJ4Hll5olA1h8qGEQj_UGetI5xMqDDc-Vw?authuser=0&eid=101986715696875566237&hl=en-GB

Can people please propose specific agenda items they'd like to cover and I'll get them in a shared doc before the meeting.

GlenTiki commented 9 years ago

Hey, I'd love to join with @dberesford to talk about the lttng linux tracing stuff we have been working on in nearform, to get your feedback. We have a PR here: (#702).

I would love to get involved with the group outside of the lttng stuff, too.

mikeal commented 9 years ago

@thekemkid awesome, you should get on the hangout today :)

Qard commented 9 years ago

WG hangout starts soon. Here's what I think should be on the agenda:

Evaluate if async_wrap is enough for core
Considering blessing some userland things, like async-listener
Figure out if generic native tracing is possible, to move need for explicit hooks for DTrace, ETW, lttng, etc. out of core.

Anything else we want to cover today?

mikeal commented 9 years ago

We should talk about the debugging tool efforts that are now possible with a newer v8. I know @thlorenz is doing some stuff already so I'm sure he can enlighten us :)

bnoordhuis commented 9 years ago

Maybe a discussion on what the interesting metrics from io.js and libuv core are and how they should be reported (counters, histograms, etc.)

I'm thinking of metrics similar to the ones in V8: https://github.com/iojs/io.js/blob/9a8f186/deps/v8/src/counters.h#L294-554.

brycebaril commented 9 years ago

I can't make today's unfortunately but I second both @Qard and @bnoordhuis here -- exposing these existing things at the JS API are the things I'm eager for.

The other thing I wanted to do was to define concepts and areas of focus such as "tracing of asynchronous operations" (e.g. round-trip timing, long-stack collection, etc.) vs. "profiling V8 & JS & C++" vs. "debugging"

piscisaureus commented 9 years ago

Will the meeting be recorded and/or are the notes going to be public?

On Thu, Feb 5, 2015 at 7:58 PM, Bryce Baril notifications@github.com wrote:

I can't make today's unfortunately but I second both @Qard https://github.com/Qard and @bnoordhuis https://github.com/bnoordhuis here -- exposing these existing things at the JS API are the things I'm eager for.

The other thing I wanted to do was to define concepts and areas of focus such as "tracing of asynchronous operations" (e.g. round-trip timing, long-stack collection, etc.) vs. "profiling V8 & JS & C++" vs. "debugging"

— Reply to this email directly or view it on GitHub https://github.com/iojs/io.js/issues/671#issuecomment-73104509.

sam-github commented 9 years ago

@piscisaureus its a hangout on air, it'll be on youtube.

othiym23 commented 9 years ago

@piscisaureus I took notes and passed them on to @mikeal, who will submit them as a PR , and the meeting was recorded as a Hangout On Air, so it will be on the io.js channel.

nodejs / node

tracing working group #671