plasma-umass / coz

Coz: Causal Profiling
Other
4.04k stars 160 forks source link

NodeJS support #142

Open filipesilva opened 4 years ago

filipesilva commented 4 years ago

Heya,

I just found this via the @emeryberger's Strange Loop talk and was super impressed!

I was wondering what would need to happen to enable profiling of NodeJS processes. Parallel execution isn't widespread in the JS world, but asynchronous execution is the norm. Getting this sort of profiling data would be invaluable.

NodeJS doesn't support macros, but it does support native c bindings via node-gyp. Maybe that could be leveraged?

filipesilva commented 4 years ago

Tried to use native c bindings in node. It works, but doesn't profile the source JS files. Instead it profiles the NodeJS C sources. Instead it should profile the JS call points.

https://github.com/filipesilva/node-coz contains all the code and coz output for the experiment.

ccurtsinger commented 4 years ago

This is something we're interested in doing, eventually. We would need to coax Node to generate debug output in some form that allows us to map JITted code back to the JS source. Non-JITted code would be trickier, unless Node has explicit support for this. The wrapper for progress points you have is probably reasonable, although it would be nice if the progress point was identified with the source location where it was called in JS rather than addon.cpp:13.

bjorn3 commented 4 years ago

Maybe it would be possible to use the native debugging facilities of the V8 js engine, for example to place breakpoints when needing to wait and to get the current source location.

aphix commented 4 years ago

Perhaps the DTrace support in NodeJS could be hooked into to expose the map from the JITted code back to the native C bindings.

Another thought would be to use a coredump (via gcore or something similar) taken during a known hotspot of running code to provide information to map the JS functions and vars to their native bindings (maybe after a few runs, to allow for optimizations to settle).

The coredump approach would probably only help for post-hoc insight after coz did it's work, and wouldn't likely work during runtime (depending on the complexity of the application), but DTrace hooks should be available during runtime AFAICT.

Not sure how viable or helpful those options are for coz, but figured it might be worth a mention.

filipesilva commented 4 years ago

I'm not super bothered with the source location being identified as addon.cpp:13 because named progress points are still possible. If I alter the test to look like this instead:

const { cozProgressNamed } = require('../index.js');

function fib(n) {
  cozProgressNamed('inside-fib');

  if (n > 1) {
    return fib(n - 1) + fib(n - 2)
  } else {
    return n;
  }
}

console.log(fib(40));

Then I'll get results that shows name=inside-fib:

experiment  selected=/home/iojs/build/deps/v8/src/builtins/base.tq:325  speedup=0.00    duration=500128300  selected-samples=474
throughput-point    name=inside-fib delta=2768455
experiment  selected=/home/iojs/build/deps/v8/src/builtins/base.tq:325  speedup=0.95    duration=46041600   selected-samples=478
throughput-point    name=inside-fib delta=2724011
experiment  selected=/home/iojs/build/deps/v8/src/builtins/base.tq:325  speedup=0.35    duration=381652000  selected-samples=339
throughput-point    name=inside-fib delta=1963855

We could also add JS code in the bindings to capture the source file location via the stack trace. That will give accurate source files and lines numbers.

~/s/node-coz (master|✚1) [127] $ node
Welcome to Node.js v12.13.0.
Type ".help" for more information.
> console.log(new Error().stack)
Error
    at repl:1:13
    at Script.runInThisContext (vm.js:116:20)
    at REPLServer.defaultEval (repl.js:404:29)
    at bound (domain.js:420:14)
    at REPLServer.runBound [as eval] (domain.js:433:12)
    at REPLServer.onLine (repl.js:715:10)
    at REPLServer.emit (events.js:215:7)
    at REPLServer.EventEmitter.emit (domain.js:476:20)
    at REPLServer.Interface._onLine (readline.js:316:10)
    at REPLServer.Interface._line (readline.js:693:8)

What I don't quite understand, probably because I don't really understand how coz works, is what I could do with the current results as a Node developer.

The last few lines of the profile show something like this:

experiment  selected=/home/iojs/build/deps/v8/src/builtins/base.tq:325  speedup=0.00    duration=500128300  selected-samples=474
throughput-point    name=inside-fib delta=2768455
experiment  selected=/home/iojs/build/deps/v8/src/builtins/base.tq:325  speedup=0.95    duration=46041600   selected-samples=478
throughput-point    name=inside-fib delta=2724011
experiment  selected=/home/iojs/build/deps/v8/src/builtins/base.tq:325  speedup=0.35    duration=381652000  selected-samples=339
throughput-point    name=inside-fib delta=1963855
runtime time=84376389500
samples location=/home/iojs/build/deps/v8/src/builtins/base.tq:325  count=57240
samples location=/home/iojs/build/deps/v8/src/builtins/base.tq:3028 count=24
samples location=/home/iojs/build/deps/v8/src/builtins/string-substring.tq:48   count=1

It sounds like the experiments were being performed in the base.tq and string-substring.tq node builtin files. As a node developer I don't really have access to those as they are not part of my sources.

amedeedaboville commented 4 years ago

It does look like coz is experimenting on raw Nodejs files. Normally you could add the --source-scope src/% parameter to limit coz to experimenting with only your source code, but coz doesn't know how to read JS files, unless as mentioned above one could make node dump a memory map of what JS functions are located where in program memory after it is done JIT compilation.

Another possible approach could be to make a Node-specific coz, possibly using the v8 sampling profiler.

Sorry if this is unrelated to the topic of making it work, but I'm confused, how can it work at all? How can coz measure virtual speedups in if Node runs mostly as a single-threaded event loop?

If you call async functions A, B, and and C, in the end they still end up running sequentially, such that you can't virtually speedup A by slowing down B and C.

I think applying coz to Node would end up sampling the (by default 4-thread) worker threadpool of libuv, and slowing down the other tasks running in the threadpool when it sees the experimented function. Is that the idea here?

sangohan commented 4 years ago

If you call async functions A, B, and and C, in the end they still end up running sequentially, such that you can't virtually speedup A by slowing down B and C.

if you virtually sped up A by slowing down B & C, then the whole program (A->B->C) would still show up as having had performance optimizations done on it. Therefore, you can analyze the effect by changing the ratio of virtual speed up of A, B and C, and see what the whole program behaves to make predictions.

I think using the chrome-debugging-client library to write a node specific coz will work (as it is impossible for coz currently to make meaningful changes to do experiments). The above library allows you to easily add breakpoints to a running node program (and to also obtain the list of all possible breakpoints, so you can accurately obtain locations to place experiments without having to analyze the source, or deal with filesystem/directories/dependencies etc.

I am trying to write/convert parts of the coz code to work with node atm, but i still don't quite understand how coz works in detail to do this conversion (e.g., how it selects which lines to run experiment on). If somebody can give me an explanation of the inner workings of coz without me having to first learn C++, i can replicate it for node.

powersjcb commented 3 years ago

@sangohan, did you ever make any progress on the Node implementation?

I've been thinking about trying to create a Coz implementation for Typescript sources by using a custom AST transformer to wrap all function calls. This would be blind to the inner workings of the NodeJS sources, external libraries, and value assignments. However, I figure it would probably give approximation for what should be be optimized.

sangohan commented 3 years ago

@powersjcb : no i haven't had enough time to explore it. May be over the xmas holidays! This is an interesting area that i dont think there exists any tooling for in the nodejs ecosystem.

it would probably give approximation for what should be be optimized.

I guess it might be easier (?) to implement tho, and it gives you enough bang for buck? I dunno - the typescript AST transformer isn't exactly trivial to use (not that i'm very familiar with typescript...).

powersjcb commented 3 years ago

@sangohan yeah, I just spent a little bit learning how the Typescript compiler works. I now have a custom transformer hello world functional.. (super fun to figure this out :-D) https://github.com/powersjcb/typescript-causal-profiler/blob/master/src/transformer.ts#L18

My thinking is

nooptoday commented 1 year ago

Any updates or plans for this being implemented?