soyuka / pidusage

Cross-platform process cpu % and memory usage of a PID
MIT License
512 stars 55 forks source link

Feature Request - Support list of pids #43

Closed simonepri closed 6 years ago

simonepri commented 6 years ago

Proposal

With the current implementation it's not possible to retrieve info for more than one pid at once. I've looked the code and it does not seems to complicate to support a list of pids. Would you consider a PR to add this?

Use Case

I need to monitor the whole tree and the readings would be a lot faster if I can retrieve info for a list of pids with a single ps command.

Currently my solution is to do multiple calls to pidusage:

const pify = require('pify');
const pidusage = require('pidusage').stat;
const pstree = require('ps-tree');

async function descendant(ppid) {
  const procl = (await pify(pstree)(ppid)).map(pst => ({
    ppid: pst.PPID,
    pid: pst.PID,
  }));
  return procl;
}
async function read() {
  let procl = [{ppid: 'somepid', pid: 'somepid'}];
  try {
    procl = procl.concat(await descendant('somepid'));
  } catch (err) {}

  // console.time('read');
  const promises = procl.map(proc => {
    return pify(pidusage)(proc.pid)
      .then(stat => {
        const snap = Object.assign(proc, stat));
        // => {"ppid": "XXXX", "pid": "YYYY", "cpu": Z, "memory": M}
      })
      .catch(() => {});
  });
  await Promise.all(promises);
  // console.timeEnd('read');
}

But with a lot of heavy loaded processes it can be really slow due to context switching. image

soyuka commented 6 years ago

A PR would be great as long as it keeps a backward compatibility I'm all in! Lmk if you don't have the time to work on it ;)

simonepri commented 6 years ago

I'll try to do it today. I'm working on a profiler tool and without it I can't reach high sampling rates.

If curious, click for a preview of the profiler.
simonepri commented 6 years ago

@soyuka The API may look like this:

// Current Behaviour
pusage.stat(pid, console.log);
// => {cpu, memory, time, start};

// Behaviour if an array is provided
pusage.stat([pid1, pid2, ..., pidN), console.log);
// => {
//   pid1: {cpu, memory, time, start},
//   pid2: {cpu, memory, time, start},
//   [...]
//   pidN: {cpu, memory, time, start},
// }

What do you think?

simonepri commented 6 years ago

Changes for ps based reads:

https://github.com/soyuka/pidusage/blob/master/lib/stats.js#L111

if (Array.isArray(pid)) {
  pid = pid.map(p => parseInt(p, 10)).join(',');
} else {
  pid = parseInt(pid, 10)
}

And then parse the stdout

Changes for proc based reads:

https://github.com/soyuka/pidusage/blob/master/lib/stats.js#L49 We need to read /proc/pid/stat file for each pid. Is there a better way to do this?

Changes for wmic based reads:

https://github.com/soyuka/pidusage/blob/master/lib/stats.js#L157 I never used wmic but if that is correct, I think that we can do something like:

if (Array.isArray(pid)) {
  pid = pid.map(p => `ProcessID=${parseInt(p, 10)}`).join(' or ');
} else {
  pid = parseInt(pid, 10)
}
var arg = `PROCESS where "${pid}"  get workingsetsize,usermodetime,kernelmodetime`

And then parse the stdout

soyuka commented 6 years ago

your profiler looks sick!

API:

pusage.stat(integer | integer[], callback)

With one integer we keep the same response indeed. I'm not sure about the multiple pids answer, won't an array of statistics be better then an object for processing?

ps change

yup alright

proc based:

We need to read /proc/pid/stat file for each pid. Is there a better way to do this?

Exactly my though when I looked into this. Reading proc files is one of the most precise thing we can do and I'd like to keep it. So nope, there's no better way of doing it. We need to read every proc file. We may need to introduce an asynchronous utility module to handle the calls?

Wmic

Haha it should work, I've only made assumptions with it though can't find a proper documentation for this... I'll try the patch manually on a windows device but your change looks fine!

Some notes though:

  1. please don't use arrow functions here, we need legacy support for at least node 0.12 for a couple more months (because of PM2)
  2. same for template strings, though I agree that this looks good :p (same reason I haven't use them in this code base)
simonepri commented 6 years ago

Thank you for the notes.

Why if you need to support node 0.12 it's not present inside the travis.yml file? Also inside the appveyor.yml tests aren't run for node > 5 I'll issue a PR for that too.

Anyway what about shipping a version 2.0.0 in the future that uses promises instead of callbacks?

I'm not sure about the multiple pids answer, won't an array of statistics be better then an object for processing?

I feel the object based version more user friendly. What are the advantages are you thinking about?

soyuka commented 6 years ago

I meant node 4 my bad. Not sure arrow functions were supported but anyway we don't need them. Same for template strings for now.

I should mention this somewhere in this project's contributing file that I wish we stick with the rules of Tiny Guide To Non Fancy Node here (link is to callback/promise stuff). If you want promises I could release a pidusage-promisify bridge.

Also inside the appveyor.yml tests aren't run for node > 5

They should indeed!

For the workflow we could add a dependency on https://github.com/feross/run-parallel or maybe run-auto to readFile proc files.

simonepri commented 6 years ago

If you want promises I could release a pidusage-promisify bridge.

Actually I use pify I don't think a new package to just promisify it is necessary.

pify(pidusage.stat)(pid)
soyuka commented 6 years ago

I feel the object based version more user friendly. What are the advantages you are thinking about?

Usually when I get multiple results I want an array with a length etc. Obviously a Map would be even nicer. An object may be user friendly, especially that we know that the PID is unique but I feel that most of the use cases would have to transform it to an array (for example chart libraries always require an array and you would Object.keys(stats).map(e => stats[e])).

soyuka commented 6 years ago

pify(pidusage.stat)(pid)

Yes anyway having a module that does the new Promise() would be faster at runtime because it'd skip the boilerplate of pify. It's micro optimization but in this library he plays a lot with arguments and it's a performance bottleneck.

https://github.com/soyuka/pidusage-promise

simonepri commented 6 years ago

With the array we must keep the results in the same order of the input array. But if one of the provided pids wasn't been read we need to keep undefined values in the array.

My usage would be:

async function read() {
  let procl = [{ppid: 'somepid', pid: 'somepid'}];
  try {
    procl = procl.concat(await descendant('somepid'));
  } catch (err) {}

  try {
    const results = await pify(pidusage.stat)(procl.map(proc => proc.pid));

    procl.map(proc => Object.assign(proc, results[proc.pid])); // If As Object
    procl.map((proc, i) => Object.assign(proc, results[i]));   // If As Array

    // => [
    //   {"ppid": "XXXX", "pid": "YYYY", "cpu": Z, "memory": M},
    //   [...]
    //   {"ppid": "XXXX", "pid": "YYYY", "cpu": Z, "memory": M},
    // ]
  } catch(err) {}
}
soyuka commented 6 years ago

mhh okay I get why an object would make more sense in your use case. An object it is then :).

simonepri commented 6 years ago

Actually for my usage case is the same. With the array it would be even faster I think.

Anyway we must define what happen if one of the readings can't be taken. (For instance the pid provided is not in the system) I don't think that a fail-fast approach is suitable. I would rather prefer to have partial results. We need to add a flag to support both behaviours?

What's the current behaviour if the provided pid cannot be read?

soyuka commented 6 years ago

Actually for my usage case is the same. With the array it would be even faster I think.

I'd prefer an array, obviously a pid property would be added to the current statistics object.

What's the current behaviour if the provided pid cannot be read?

The first argument of the callback is an error. Statistics are null.

I don't think that a fail-fast approach is suitable. I would rather prefer to have partial results.

Indeed, IMO we don't want the same behavior when issuing multiple requests and the code will need some refactoring to support that properly. I'm :+1: for partial results. I don't think we need to support both behaviors. For error management I'm open to suggestions, my first thought would be to do something like AggregateError.

simonepri commented 6 years ago

So your suggestion is to pass both error and statistics to the callback? Isn't that weird?

soyuka commented 6 years ago

I'm really not sure how to handle this. It's weird but on the other hand if you give partial results, you have partial errors no?

Btw the callback is always like this:

stat(pid, function (err, stat) {
  // err is null when no error occured
})

Although, the promise will reject with what I had in mind so that's probably the wrong approach :|.

The right way would be to implement an EventEmitter and to send error events. Pidusage isn't built with an EE in mind but maybe we could. WDYT?

btw we can talk on irc or slack if it makes things easier.

simonepri commented 6 years ago

An event emitter a seems a reasonable solution but is not backward compatible.

soyuka commented 6 years ago

Creating an event emitter for every stat call seems a bit too much anyway :|. I think we can make it backward compatible though but it'll be less performant.

A third solution I had in mind would be to have an error property on the stat but that's kinda dirty. Or a fourth solution would be to use an onerror option (function signature) that's called on error. Also kinda dirty.

For now, let's do this when an array of pids is given:

  1. never give any statistics error and return partial results
  2. when {strict: true} is given and an error occurs use a fail-first approach

I think that this is the easiest implementation.

soyuka commented 6 years ago

45