Feature Request: Limit Userland Recursion in Async Functions

astormnewrelic commented 4 years ago

We've recently noticed/discovered that it's possible, via the await keyword, to create an infinitely recursive function that doesn't blow up the V8 stack. We're trying to find ways to identify and stop this sort of recursion.

Is your feature request related to a problem? Please describe.

Normally, the --stack-size option of NodeJS prevents runaway recursion from becoming a problem. If a function recuses too deeply, the stack eventually grows too big, and node spits out an error and (appears to?) stop the recursion without exiting the program.

However, with code like this, --stack-size is no protection.

const recurse = async () => {
    // ... function does some work ...

    await someOtherFunctionCall()
    // when execution returns to this function, V8's callstack isn't the
    // long lists of recursive function calls

    // ... function does some more work ...

    // recurse!
    recurse()
}

We assume that although this is userland recursion, that behind the scenes the Node, V8, or libuv internals are handling the code after await via some sort of callback that doesn't have the original V8 stack attached. (We're not familiar enough with Node's inner workings to know for sure, so any corrections here would be appreciated).

Depending on the work being done in the recursive function, or how/when/often it's called, this can create performance issues in an application or service that are hard to track down.

Describe the solution you'd like

The naive solution that we'd like is some sort of feature that allows us to limit the userland call stack depth. i.e. "if function recurses 1000 times something is probably wrong and we should -- do something". "Do something" is probably throw an error and stop the recursion, but we're not familiar enough with the internals to know if this is a viable solution or not.

The main goal of any solution would be to put node in a temporary state where we can easily identify and be notified then this sort of infinite userland recursion happens, and which function it is that's causing the recursion.

We're an analytics/monitoring company, and we often see this problem in our customer's applications. Our instrumenting code is often blamed for this recursion, when it is, in fact, just a poor interaction between our instrumenting code and our customer's application. Being able to have a feature that would let us identify and point to this recursion would be super helpful.

Describe alternatives you've considered

We first investigated the --stack-size option, but as mentioned, since V8's stack isn't growing this does not let us stop this sort of recursion.

We've also asked our customers, and tried ourselves, to not write bugs. This has had mixed results.

jasnell commented 4 years ago

Yep, this is actually intrinsic to the design of Promises and is really not something that we could do too much about automatically. The reason this happens is because in your example...

const recurse = async () => {
    // ... function does some work ...

    await someOtherFunctionCall()
    // when execution returns to this function, V8's callstack isn't the
    // long lists of recursive function calls

    // ... function does some more work ...

    // recurse!
    recurse()
}

The recursive call to recurse() is effectively a .then() handler for a Promise running in a separate call stack from the first part. The above code is roughly equivalent to...

someOtherFunctionCall().then(recurse)

Node.js (actually V8) will execute the someOtherFunctionCall() and will reschedule everything that comes after it in the microtask queue, which is drained sometime after someOtherFunctionCall() exits, putting the two in completely different stacks. What your code here is effectively doing then would be roughly equivalent to doing something like...

function doItAgain() {
  setImmediate(doItAgain)
}
doItAgain()

Which would cause the function to be called over and over at the start of every event loop turn.

There's not much we can do about this in Node.js automatically because there might actually be legitimate reasons for having this recursion! The best approach is to have some additional mechanism for tracking, such as monitoring the number of times a function is called recursively...

const recurse = async (count = 0) => {
    if (count > 10) throw new Error('boom!')
    // ... function does some work ...

    await someOtherFunctionCall()
    // when execution returns to this function, V8's callstack isn't the
    // long lists of recursive function calls

    // ... function does some more work ...

    // recurse!
    recurse(count)
}

astormnewrelic commented 4 years ago

Thanks @jasnell -- that's a very cogent description of the situation. This definitely seems like one of those places where whether you see await as either "just syntactic sugar" or a "a core language feature" can affect your expectations. Single thread async is a wild place :)

Given what you've said, would it be fair to say that if some hypothetical --max-await-nesting-level option for node existed (Similar to PHP's/xDebug's max_nesting_level) that this functionality would need to be something provided by the V8 project? (vs. being able to implement via the libuv layer or the c++ bindings in NodeJS itself?). I've watched @cjihrig's and @joyeecheung's talks on some of the internals but my understanding of the boundaries of responsibility for each system is still pretty vague and fuzzy.

The challenge of with your last (and useful!) suggestion is it requires everyone to build that count into their recursive functions. We're trying to diagnose unintentional recursion bugs introduced by the sort of programmers who have a less than perfect understanding of how all this works. The same way someone might have a legitimate reason to tweak the --stack-size option, someone might want to tweak their (again, hypothetical) --max-await-nesting-level down low to immediately surface places where this recursion was happening.

jasnell commented 4 years ago

The key challenge with building that kind of mechanism into Node.js directly is that we'd have to enable a hook to track every async function to see if it happened to recurse into itself and that would be extremely costly (and I'm not even sure possible) from a performance perspective.

astormnewrelic commented 4 years ago

extremely costly (and I'm not even sure possible) from a performance perspective

nod -- very reasonable take. Not a feature you'd want turned on in production -- it'd still be interesting know where/how that hooking would take place.

addaleax commented 4 years ago

@astormnewrelic If you’re okay with a nontrivial performance cost, I think you might be able to detect this situation through async_hooks?

jasnell commented 4 years ago

Using async_hooks could work but you definitely lose context because the hooks do not tell you where in the source code you may be experiencing the problem. That is, the async hook will tell you that you have a Promise being created, and could tell you that you have a Promise creating a Promise creating a Promise... etc, but it won't tell you if that Promise is the result of runaway recursion on a single async function.

astormnewrelic commented 4 years ago

@addaleax that's a good question that I don't know the answer to -- I'm not super familiar with the full functionality provided by async_hooks other than "hey something async happened so we'll call a function and tell you about it". Worth my investigating for sure -- thanks for the suggestion!

himself65 commented 4 years ago

close as resolved

astormnewrelic commented 4 years ago

@addaleax Thank for your the async_hooks suggestion. Just to follow up and close off this discussion -- here's what async_hooks did/didn't get us/me with regards to this particular use case. (This is also repeating a lot of what was already said above, but in my own words and with an eye towards the use case I'm interested in)

With both an ID for the async resource and an ID of the async resource that triggered the creation of a particular async resource, we can keep track of a tree of the relationships between all the async resources, and when that tree's depth gets past a certain point it's a pretty good signal that there's some sort of -- I'll call it async recursion (the recursion I described above) -- going on.

What the async_hooks module doesn't get us:

Is this deep async recursion the result of returning to an async function after an await, or just some pattern of regular userland created promises
Where in userland code did a particular async resource originate from

nodejs / node

Feature Request: Limit Userland Recursion in Async Functions #33023