nodejs / performance

Node.js team focusing on performance
MIT License
371 stars 7 forks source link

Optimizing internal argument validation functions #140

Open fabiospampinato opened 7 months ago

fabiospampinato commented 7 months ago

Some internal functions used to validate arguments, like validateString, seem to be called all over the place throughout the codebase, so making these calls as fast as possible may be worth it.

IMO the way they are being used today is probably a lot less efficient than it could be, though without measuring the impact of the change it's hard to say what it will be.

Let's take a look at a scenario, and at how to make it faster:

  1. For example validateString is being called here to validate our argument when doing crypto.createHash('sha1').
  2. validateString's definition is basically a function wrapped by hideStackFrames
  3. hideStackFrames basically makes a function that calls our function, via ReflectApply, and if an error is thrown it does something with it and re-throws it.
  4. Finally the actual body we want to execute of the validateString function just does a typeof check, and if it fails an error is thrown.

That seems a pretty convoluted way to do a typeof check basically. And I think it could be made faster, though unclear by how much, like this:

  1. Instead of writing this:
    const validateString = hideStackFrames((value, name) => {
      if (typeof value !== 'string')
        throw new ERR_INVALID_ARG_TYPE(name, 'string', value);
    });

    I guess we could just write this:

    const validateString = (value, name) => {
      if (typeof value !== 'string')
        throw hideStackFramesFromError(new ERR_INVALID_ARG_TYPE(name, 'string', value));
    };

    Basically the idea is that unless the error is actually throw, we pay absolutely 0 cost for whatever hideStackFrames does, and instead we manually call a function that does what's needed on the error. Doing it this way would be a bit more error prone, but these validation functions seem to be fairly simple, and called all over the place, so manually inserting the hideStackFramesFromError calls in the right spots shouldn't be a problem.

  2. Lastly it may be best to inline these tiny functions that are called a million times at build time, perhaps defining a dedicated function for their erroring branch, to keep the emitted code to parse minimal. So for example the definition of our validateString would become something like this instead:
    const validateString = (value, name) => {
      if (typeof value !== 'string')
        validateStringFailure(value, name);
    };

    And when inlined this code:

      if (!(algorithm instanceof _Hash))
        validateString(algorithm, 'algorithm');

    Will be transformed automatically into this:

      if (!(algorithm instanceof _Hash))
        if(typeof algorithm !== 'string')
          validateStringFailure(algorithm, 'algorithm');

Basically when errors won't actually be thrown, which presumably is almost always the case, we would go from 3 function calls to 1 without the inline step, and to 0 function calls with the inline step. Also the try..catch block would get deleted, that may be ~zero cost though if not triggered, I'm not sure.

I would guess that we could see a low single-digit percentage performance improvement with this optimization applied to all the validation functions, at least in some cases, because pretty much every Node API seems to call some of these, often more than once, so it seems probably a worthwhile optimization to me.

Uzlopak commented 7 months ago

I dont see how hideStackFramesFromError could be implemented with the example you provided.

Also there are cases were a function calling hideStackFrame is calling a function which is also wrapped in hideStackFrame. How do you want to hide the outer call?

fabiospampinato commented 7 months ago

I dont see how hideStackFramesFromError could be implemented with the example you provided.

Maybe it would need to be reworked differently, but assuming that errors are almost never thrown during validation you could even do something like this and it should provide the same improvement, like this isn't a show stopper probably:

if(typeof algorithm !== 'string')
  validateString(algorithm, 'algorithm')

Also there are cases were a function calling hideStackFrame is calling a function which is also wrapped in hideStackFrame. How do you want to hide the outer call?

What would change for that exactly if it's done how it's done today vs. how I'm proposing it should probably be done?

H4ad commented 7 months ago

The older version of hideStackFrame was pretty simple and the overhead was on internal/error (to filter these functions out).

@Uzlopak Do you think we can go back to the older behavior? (after measuring the perf, of course)

Uzlopak commented 7 months ago

The old behavior was neat and simple with the cost of instantiating errors being super slow. When an Error was thrown, it was collecting the whole stacktrace (strackTraceLimit = Infinity!) and then in prepareStackTrace we filtered them out.

Now it just adds the necessary stacktrace if we have an error case. So if we error, then we call Error.captureStackTrace. Keep in mind, that the second parameter contains the "constructor" function of the error, which can be any function. The first occurence of that function is used and all following stacks are cut off.

The proposal of @fabiospampinato can not work out of the box, because we have no reference to function we want to hide. Also we dont have access to arguments.caller. So we need probably do it differently. E.g. by passing the reference of the function to be "hid" as first parameter and not instantiating the error

const validateString = (value, name) => {
  if (typeof value !== 'string')
    throw hideStackFramesFromError(validateString, ERR_INVALID_ARG_TYPE, name, 'string', value);
};

So hideStackFramesFromError is just

const hideStackFramesFromError(ctor, errCtor, ...args) {
    const stackLimit = Error.strackTraceLimit
    Error.strackTraceLimit = 0
    const err = new errCtor(...args)
    Error.stackTraceLimit = stackLimit
    Error.captureStackTrace(err, ctor)
    return err
}

But there are hideStackFrames wrapped functions which call other hideStackFrames wrapped functions. How can you hide the outer function from the inner function?

So yeah, maybe a nicer method is preferrable. I am not disagreeing. But these are the actual issues we face.

If we can agree, that we dont call hideStackFrames wrapped functions from other hideStackFrames functions, meaning we avoid the encapsulation problem as described above, we can actually simplify it ALOT. And the proposed solution would be feasable.

fabiospampinato commented 7 months ago

The problem with nested hideStackFrames-wrapped function calls is that if we do it as I'm proposing we can't delete all the functions that we want to delete from the stack trace, because only one of them is making the error object, right?

I think this may largely be a non-problem for validation functions, if a validation function doesn't call other validation functions, or other functions wrapped in hideStackFrames in general, like it's the case for validateString, because then it doesn't matter how we intercept the error with it because we are at the end of the stack, the error will bubble up naturally from there, and however else wants to capture it with a try..catch can do it. This won't work if a validation function calls other functions that do the same trick though.

Not the entire problem needs to be solved at once imo, applying the trivial version of this to the validation functions where this can be applied to, and measuring the impact of that, may already be worth it.

Uzlopak commented 7 months ago

Exactly. There are already existing double wrapped hideStackFrames function.

E.g. checkFiniteNumber in /lib/zlib.js

How to solve that.

fabiospampinato commented 7 months ago

How to solve that.

Like this maybe:

// 1. Returns false for undefined and NaN
// 2. Returns true for finite numbers
// 3. Throws ERR_INVALID_ARG_TYPE for non-numbers
// 4. Throws ERR_OUT_OF_RANGE for infinite numbers
const checkFiniteNumber =(number, name) => {
  // Common case
  if (number === undefined) {
    return false;
  }

  if (NumberIsFinite(number)) {
    return true; // Is a valid number
  }

  if (NumberIsNaN(number)) {
    return false;
  }

  if (typeof value !== 'number')
    throw doStuff(checkFiniteNumber, new ERR_INVALID_ARG_TYPE(name, 'number', value));

  // Infinite numbers
  throw doStuff(checkFiniteNumber, new ERR_OUT_OF_RANGE.HideStackFramesError(name, 'a finite number', number));
});

Not every one of these need to be made efficient for it to be worth it, imo.

Uzlopak commented 7 months ago

I would have said:

// 1. Returns false for undefined and NaN
// 2. Returns true for finite numbers
// 3. Throws ERR_INVALID_ARG_TYPE for non-numbers
// 4. Throws ERR_OUT_OF_RANGE for infinite numbers
const checkFiniteNumber = (number, name) => {
  // Common case
  if (number === undefined) {
    return false;
  }

  if (NumberIsFinite(number)) {
    return true; // Is a valid number
  }

  if (NumberIsNaN(number)) {
    return false;
  }

  if (typeof value !== 'number')
    throw hideStackFramesNew(checkFiniteNumber, ERR_INVALID_ARG_TYPE, name, 'number', value);

  // Infinite numbers
  throw hideStackFramesNew(checkFiniteNumber, ERR_OUT_OF_RANGE, name, 'a finite number', number);
};
davbrito commented 7 months ago

I don't know anything about Node or V8 internal workings, but I think that the solution would be to find a way to do what hideStackFrames does (which is hiding the function from the stack trace as far a I understand), but at lower level, such as setting some kind of "internal metadata" in the function to tell the engine to skip showing it on the stack. 🤷🏽

Uzlopak commented 7 months ago

Basically how hideStackFrames was implemented before, but was having very poor performance. I read the v8 code especially for this and could not find anything...

anonrig commented 7 months ago

One way of avoiding argument validation functions is to move it to C++. I've opened an example PR: https://github.com/nodejs/node/pull/51027