nodejs / readable-stream

Node-core streams for userland
https://nodejs.org/api/stream.html
Other
1.03k stars 227 forks source link

How to detect slowest stream in pipeline? #396

Closed julien-f closed 5 years ago

julien-f commented 5 years ago

Thanks to back-pressure, the speed of a pipeline limits itself to the speed of the slowest stream at any given time which avoids waste of resources such as CPU or memory.

But it would be nice to be able to see which one is the current bottleneck, for instance if I'm downloading some data from a server, applying a transformation, and uploading it to another server, I would like to be able to figure out what I can do to speed up the operation.

I don't know how easy it is to do with the current API, but it may be nice to expose this, maybe a property similar to the CPU load average for the past 1, 5, and 15 minutes. 0 would mean that the stream is always awaiting and 1 that it's always working (with the proper meanings to be defined for readable, transform and writable streams).

mcollina commented 5 years ago

There are no tools in Node core at this point. I would recommend to use https://clinicjs.org/bubbleprof to investigate (disclaimer, I'm leading the team that built that) . I think adding a custom API for this would severely increase the overhead of streams, which are already quite heavy unfortunately :/. If you would like to give that a shot, you might want to send a PR against node core.

julien-f commented 5 years ago

Thanks for the info, I'll take a look to BubbleProf!

Do you have any idea how it could be done from the outside for a readable stream?

I have a few:

mcollina commented 5 years ago

Franky, I've never thought about this. Let us know how you are solving this problem!

julien-f commented 5 years ago

I've started working on this, here is my first (terrible) approach:

// add a `.waiting` property which contains the number of ms between the read
// data and read requests
const decorateReadable = readable => {
  let reading = readable._readableState.reading;
  let date;
  let waiting = 0;
  Object.defineProperty(readable._readableState, "reading", {
    get() {
      return reading;
    },
    set(value) {
      reading = value;
      if (reading) {
        if (date !== undefined) {
          waiting += Date.now() - date;
        }
      } else {
        date = Date.now();
      }
    },
  });
  return Object.defineProperty(readable, "waiting", {
    get() {
      return waiting;
    },
  });
};

// add a `.waiting` property which contains the number of ms between the write
// requests and finished writes
const decorateWritable = writable => {
  let writing = writable._writableState.writing;
  let date;
  let waiting = 0;
  Object.defineProperty(writable._writableState, "writing", {
    get() {
      return writing;
    },
    set(value) {
      writing = value;
      if (writing) {
        if (date !== undefined) {
          waiting += Date.now() - date;
        }
      } else {
        date = Date.now();
      }
    },
  });
  return Object.defineProperty(writable, "waiting", {
    get() {
      return waiting;
    },
  });
};
mcollina commented 5 years ago

@julien-f have you checked how that impact performance? It seems a very idiomatic way to achieve that goal, maybe you should consider publish it in a module.

julien-f commented 5 years ago

Nope, not yet.

I may do this but I'm reluctant to rely on internals…

mcollina commented 5 years ago

You can do a similar approach but rely on overriding _read, _write and _writev.