nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.29k stars 28.53k forks source link

async_hooks: AsyncLocalStorage losing context #41285

Open mcollina opened 2 years ago

mcollina commented 2 years ago

Version

node v17.3.0 / 16.13.0

Platform

linux (likely all)

Subsystem

async_hooks

What steps will reproduce the bug?

Run

'use strict'

const { AsyncLocalStorage } = require('async_hooks')
const { createServer } = require('http')

const storage = new AsyncLocalStorage()

let counter = 0
createServer((req, res) => {
  const id = counter++;
  console.log('In Middleware with id ' + id);
  storage.run({ id }, function () {
    req.resume()
    req.on('end', onEnd(res))
  });
}).listen(3000)

function onEnd (res) {
  return () => {
    const store = storage.getStore()
    console.log('store is', store)
    res.end(JSON.stringify(store))
  }
}

Then:

curl -d '{}' -H 'Content-Type: application/json' localhost:3000

You'd note that the store is empty.

How often does it reproduce? Is there a required condition?

All of them. Unfortunately the documentation is quite misleading and it assumes the above would work / we do not document the edge cases.

The problem originates on the fact that receiving an HTTP body is part of the http request AsyncRecource. Calling .resume() on it does not imply we are attaching the new storage to that AsyncResource. However our documentation states:

Runs a function synchronously within a context and returns its return value. The store is not accessible outside of the callback function. The store is accessible to any asynchronous operations created within the callback.

Unless somebody knows that calling resume() is not creating a new asynchronous operation, they will be expecting the context to be preserved.

As a further confirmation, the following code works:

'use strict'

const { AsyncLocalStorage } = require('async_hooks')
const { createServer } = require('http')

const storage = new AsyncLocalStorage()

let counter = 0
createServer((req, res) => {
  const id = counter++;
  console.log('In Middleware with id ' + id);
  storage.enterWith({ id })
  req.resume()
  req.on('end', onEnd(res))
}).listen(3000)

function onEnd (res) {
  return () => {
    const store = storage.getStore()
    console.log('store is', store)
    res.end(JSON.stringify(store))
  }
}

What is the expected behavior?

We might decide this is a bug that should be fixed or we might decide to stabilize enterWith and document this case.

The problem is that the current documentation is misleading folks into thinking storage.run() is the right API in all cases but it is not as things are.

What do you see instead?

No response

Additional information

No response

mcollina commented 2 years ago

cc @vdeturckheim @nodejs/diagnostics @nodejs/http

bmeck commented 2 years ago

I've talked about stabilizing .enterWith with a more readable alias before with @qard , I think I was the last holdout but have not found any alternative that fixes the userland queue problem. I think that is a fine path but likely there should be a "gotchas" section added to the doc about both downfalls.

Qard commented 2 years ago

I'm always all for more and better docs. I do think we should avoid encouraging people using enterWith though as it has implications on the runtime behaviour that would not be understood by someone that doesn't know what's happening internally. In particular that a "tick" doesn't end at the edge of user code, it often ends somewhere in Node.js internals and other things can happen in that gap that may be unexpected to the user. That's the main reason the explicit scope termination of run is recommended.

I don't have time at the moment to take a close look at this issue to make any suggestions on how to fix it other than with enterWith right now though. Maybe I'll have some time after Christmas.

vdeturckheim commented 2 years ago

I was thinking about a bindEmitter method on AsyncLocalStorage, but it is actually the same as enterWith I guess.

mcollina commented 2 years ago

Do you know why the following works? It seems that wrapping it inside setImmediate() makes the context propagate correctly. I'm probably missing some piece and I designed some part of it :/ (I'm hoping that maybe we have a bug somewhere and we can make this work).

'use strict'

const { AsyncLocalStorage, AsyncResource } = require('async_hooks')
const { createServer } = require('http')

const storage = new AsyncLocalStorage()

let counter = 0
createServer((req, res) => {
  const id = counter++
  console.log('In Middleware with id ' + id)
  setImmediate(() => {
    storage.run({ id }, function () {
      req.resume()
      req.on('end', onEnd(res))
    })
  })
}).listen(3000)

function onEnd (res) {
  return () => {
    const store = storage.getStore()
    console.log('store is', store)
    res.end(JSON.stringify(store))
  }
}
Flarna commented 2 years ago

As far as I remember the exact timing and the segmentation of TCP/IP packets has impact on which resource is actually active in which event. I while ago I analyzed #32195 which has shown also differences if Promise.resolve() or setImmediate was used for an async operation before registering end/data handlers.

http/net modules are significant older then async hooks and where never designed in a way to represent a unique resource per request. an http server request consists of server (sometimes reused) lower level resources and as a result it depends a lot on the timing TCP/IP segmentation,.. which resource is actual active at which time.

I think we could try to fix this by using e.g. EventEmitterAsyncResource recently added to combine all request events into a dedicated HTTP request resource (same for response). But a user would still need enterWith to "attach" to this resource as it is created node internal before user code is reached.

targos commented 2 years ago

Isn't it the kind of cases where AsyncResource.bind() should be used? https://nodejs.org/dist/latest-v17.x/docs/api/async_context.html#integrating-asyncresource-with-eventemitter

mcollina commented 2 years ago

Yes, exactly. However that limitation is not recommended from the AsyncLocalStorage docs and a high-level user would not be aware of that limitation. enterWith solves the problem too and it's likely a good idea to have both APIs.

gongshishao commented 2 years ago

How it's it going ? I also met this issus, while node tracing post request context. AsyncLocalStorage.getStore lost the store value. And I think enterWith is not the best solution to solve this issue as Doc suggest: https://nodejs.org/dist/latest-v16.x/docs/api/async_context.html#asynclocalstorageenterwithstore

orgads commented 8 months ago

We had a similar issue, and we've noticed that the reason setImmediate probably solved your issue is that end was already received on the socket when resume was called.

It can be seen by comparing stack traces of "with context" and "without context".

We've used the workaround from https://github.com/nodejs/node/issues/34430#issuecomment-1468613220 to overcome this.

vickkhera commented 1 month ago

I ran into a similar issue with papa parse where the async context disappeared after a few callbacks to the chunk hook when processing a large file with streams. My solution was to move the bit of code that needed the context up above the promise in which papa parse was running into a small helper function, then binding it with AsyncResource.bind(my_function). With this, the callback was able to run every time and access the context within that helper function. I found this solution on stack overflow.

I also want to add that the documentation for troubleshooting context loss is a little ambiguous in the sentence "If your code is callback-based, it is enough to promisify it with util.promisify() so it starts working with native promises." It is unclear to what the word "it" refers. As my code was already a promise, it was doubly confusing what I should do. I was glad to find the workaround with bind.