async_hooks & node-fibers discussion

laverdet commented 6 years ago

I'm opening this issue to discuss the current status of async_hooks in nodejs and how it affects node-fibers.

@AndreasMadsen I want your input here about what fibers can do going forward to ensure that what we're doing here remains possible as async_hooks matures. I've added you to this thread because of your work on async_hooks in the nodejs repository.

I'm not sure if you're familiar with node-fibers but as the name implies I've implemented fibers/coroutines in nodejs. I started this project in 2011 on node v0.2.x before async/await even had a viable specification proposal-- the value of this project is questionable now that support for single-frame continuations exists in JS natively, but the fact remains that a lot of projects still depend on node-fibers.

I managed to pull this implementation off with some hacks to thread-local storage to make v8 think it is running in multiple threads when what is actually happening behind the scenes is just setjmp/longjmp magic. Appropriate Locker/Unlocker scopes are in place and all this is expressly allowed in the v8 API. Some members of the node team have expressed disgust and outrage with this approach but it works. Additionally the same functionality can be implemented cleanly without violating any invariants via threading and conditional waits, though we would lose the lightweight nature of the current implementation.

Anyway, I find myself fighting with async_hooks in recent days because it justifiably assumes that there will be only one stack. I've put in place even more hacks to deal with this, most recently in nodejs v8.10.0 due to the changes to the undocumented APIs I was using. You can see this implementation in fibers.js#L29.

The hacks can be expressed as two simple functions: getAndClearStack() and restoreStack(). The first function saves the current async stack, returns it as an array, and then sets the stack to empty. The second function restores an async stack using the return value from a previous invocation to getAndClearStack. restoreStack assumes that it is called while the async stack is currently empty. node-fibers can then call those two functions where appropriate and nodejs is happy. It's important to note that all the RAII goodness of AsyncHooks::CallbackScope continues to work with these two functions in place since the correct async stack will always be restored before longjmp'ing back into that scope.

So what I would I like to ask is: would the nodejs team be open to supporting, or allowing me to support, these two functions in the async_hooks API?

laverdet commented 6 years ago

cc: @benjamn

benjamn commented 6 years ago

What if the async_hooks stack was stored in thread local storage by Node/V8, so that there would automatically be separate stacks for each thread/fiber? Then you wouldn't have to save and restore the stack when switching threads/fibers, which would be much more efficient—not to mention less dependent on a frequently changing internal API.

laverdet commented 6 years ago

Well there's two problems I see with that. The first is that it will add some overhead to all nodejs applications just for the sake of node-fibers. I don't think it will fly just for that reason. The second reason is that it continues the cycle of TLS hacks which are pretty dirty. In the current code the TLS magic is scoped only to v8 so I would have to extend that to capture nodejs as well. This could be done with a C++ API on node's end that I could link to but it seems more cumbersome than just exposing something via process.binding.

benjamn commented 6 years ago

I understand your skepticism, but I think this idea is important to make async_hooks thread-aware in a general sense, so it wouldn't be just for the sake of fibers.

Also, this doesn't introduce any additional hacking of thread-local storage. The Node implementation of async_hooks would simply need to hide the pointer to the async ID stack behind a function call, and make that function retrieve the pointer from TLS. Any multithreaded program would benefit from this layer of indirection, I think.

What I'm less sure about: V8 certainly has internal APIs for thread-local data, but I have no idea whether/how those APIs are exposed to Node. Since async_hooks is implemented on the Node side, it may be difficult to store the async ID stack in TLS, but it should be doable.

benjamn commented 6 years ago

Another idea: if Node was willing to expose an API for swapping out the whole async ID stack at once, rather than popping/pushing every item, that would help node-fibers implement this functionality more efficiently.

AndreasMadsen commented 6 years ago

Is there something that is impossible right now, due to recent changes?

We already expose the top of the stack, though JS and C++ APIs. The rest of the stack exists for validation. If we expose that it allows users to temper with the validation.

So even if we overlook that fiber is already a hack, I think it is going to be hard to convince anyone to do this. You could try to send a PR with the TLS idea and benchmark how much difference it makes. If it isn't measurable then I would be open to do it that way. Be aware that since async_hooks is involved in pretty much everything it has to very efficient.

/cc @nodejs/async_hooks

benjamn commented 6 years ago

@AndreasMadsen Without careful saving/restoring, the async ID stack can get corrupted if used from multiple threads (not necessarily fibers), even if the threads are fully cooperative and always lock the Isolate (as fibers do).

Scenario: one thread pushes an ID, then switches to another thread that pushes another ID, which switches back to the first thread before popping that ID (because it hasn't finished the async operation it started), and then the first thread tries to pop the ID that it pushed, but finds the ID from the second thread on top of the stack, which causes the async_hooks implementation to abort.

This problem would be solved if each thread had its own async ID stack.

Is there something that is impossible right now, due to recent changes?

The saving and restoring is very hard to do correctly with the current process.binding('async_wrap') internal API. For example, @laverdet implemented a solution for https://github.com/laverdet/node-fibers/issues/357 for Node 8.9.4, which is already broken in 8.10.0 because aw.asyncIdStackSize() no longer exists (see https://github.com/laverdet/node-fibers/commit/d2c04c314bb4c07739493e38d6f9c410807850df for a workaround).

Additionally, this "solution" involves manually pushing/popping the entire stack when switching fibers, which is very slow. If async_hooks was thread-aware, node-fibers wouldn't have to use internal APIs to manipulate the async ID stack.

laverdet commented 6 years ago

I understand your skepticism, but I think this idea is important to make async_hooks thread-aware in a general sense, so it wouldn't be just for the sake of fibers.

Making async_hooks thread-aware doesn't help fibers at all though. There are no threads to speak of so we would still need a side-channel to swap out the data. Maybe if there was a greater effort in nodejs to add support for threads I would be behind this idea but as it stands today nodejs is a single-threaded app down to its core (well except for libuv). Really coroutines are the only use-case I can think of for this.

I think maybe I haven't made clear exactly how the TLS hacks work. I'm not swapping out the entirely of the thread-local storage block. What I do is look into some internal v8 structures and find out where they stash their TLS data. There's exactly 3 TLS keys I need to discover and then I manually modify those 3 keys after every context change. So if we just made this structure thread-local in nodejs I would need to discover a 4th key to modify.

What I'm less sure about: V8 certainly has internal APIs for thread-local data, but I have no idea whether/how those APIs are exposed to Node. Since async_hooks is implemented on the Node side, it may be difficult to store the async ID stack in TLS, but it should be doable.

C++11, which I believe nodejs is already using, added a thread_local keyword. You just stick that on any variable and it goes to TLS. v8 has platform-specific implementations because that feature didn't exist in C++ until relatively recently. I'm actually quite glad they're still using the platform-specific implementations because it makes it much easier to discover those keys. It will be a bad day for me if they ever decide to use thread_local.

Another idea: if Node was willing to expose an API for swapping out the whole async ID stack at once, rather than popping/pushing every item, that would help node-fibers implement this functionality more efficiently.

That was the action item I was trying to communicate in the final line of my original message. The two functions I would need for future interoperability are getAndClearStack() and restoreStack(). I'm not actually too concerned about the speed of the current implementation. These arrays exist in C so we would have to iterate them to copy anyway. Doing it all at once would still be a small performance win though. Actually, now that I think about it with the current API I could make it a good bit faster by modifying async_ids_stack manually instead of going through the pop/push API.

Is there something that is impossible right now, due to recent changes?

We already expose the top of the stack, though JS and C++ APIs. The rest of the stack exists for validation. If we expose that it allows users to temper with the validation.

Well, process.binding('async_wrap') currently exposes everything I need so there is nothing that is impossible today that was possible yesterday. I can extract the size of the stack from async_id_fields and kStackLength. I can observe and modify the stack with async_ids_stack. My main concern is that as this API matures this internal API will go away or expose less information to the keen hacker. In that case I would have to resort to even dirtier hacks in C++ to muck with node's state directly.

benjamn commented 6 years ago

There's exactly 3 TLS keys I need to discover and then I manually modify those 3 keys after every context change. So if we just made this structure thread-local in nodejs I would need to discover a 4th key to modify.

I hadn't fully considered that possibility, so I'm glad you mentioned it. If Node stored the async stack directly in TLS (like, literally cast to a void* and passed to pthread_setspecific), then yes, node-fibers would have to start worrying about another TLS value.

However, one of the three TLS values that node-fibers already manages is the thread ID, which is different for each thread. So there's a chance you wouldn't have to track a fourth TLS value if Node implemented the async stack-per-thread idea with a map from thread IDs to async stack pointers, rather than using TLS directly. Then the current thread ID (which node-fibers already manages) would always determine the current async stack. I would definitely prefer this implementation, but that's partly because it simplifies the work node-fibers has to do.

I know I've said this before, but the best path forward might be for me to implement the necessary changes in native Node code, so that we can see if there are any performance implications, and invite concrete discussion from the Node folks. If the second strategy I described above works (and is fast enough), then it would be great to get those changes into Node, since that would make it considerably easier to maintain this library.

laverdet commented 6 years ago

I don't think v8 exposes the thread id anywhere so you would have to use a private API (like node-fibers does, very naughty) to get it. You could leak memory with a solution like that because there is no mechanism in place for v8 to notify the embedder that the "thread" is dead. node-fibers calls isolate->DiscardThreadSpecificMetadata() in the case that a coroutine is going to be destroyed but nodejs wouldn't be notified. Anything you stash in a map based on that id would have to allocated and destroyed once per context switch instead of once per coroutine.

bjouhier commented 6 years ago

@AndreasMadsen If I understand well, @laverdet is asking for 2 new functions in the async_hooks API: getAndClearStack() and restoreStack().

These functions would only be called by 3 functions of the node-fibers library (yield, run and throwInto). There would be zero overhead for node.js code outside of these 3 functions.

These functions have been implemented in pure JS by @laverdet but this is brittle as it uses undocumented APIs, which did change and may change again. It would be more robust, and maybe faster too, if they were provided by the async_hooks library.

AndreasMadsen commented 6 years ago

@bjouhier I understand. However, implementing those functions would legalize clearing the state stack that is used for very critical sanity checks. I don't think that is something we would want to expose.

As for the complexity of the functions you have implemented, that won't be solved by implementing getAndClearStack and restoreStack in nodecore, as I guess you would still want to maintain backward compatibility.

Feel free to send in a PR. However, I get the feeling you have more experience in what response to expect than I have.

Qard commented 3 years ago

As the current maintainer of async_hooks, I will say there's basically zero chance of such a change being accepted.

We're currently on a path to less internals exposed, not more, and are also aiming to deprecate async_hooks as a public interface. It is already wildly unsafe and a risk to platform stability and security, this would make it even more so.

Making async_hooks thread-aware would also be counter-intuitive as it's intentionally not thread-aware because worker threads are supposed to be fully isolated. This all comes down to a strong push towards security and isolation in recent years. These parts of the core infrastructure need to be more locked down and protected from tampering like this.

All this juggling of stacks through thread local storage is also a giant red flag to me from a stability perspective. As @AndreasMadsen has already stated, async_hooks is deeply interwoven into everything in Node.js core. It needs to be both fast and stable and a change like this is a big jump in the complete opposite direction.

It's also important to understand that, conceptually, fibers just doesn't match the event model of async_hooks. Mutating the stack isn't enough, you need to also ensure that before and after events are triggered in the right places, but if you are interleaving activity like that you are layering different async stacks within each other, which is conceptually impossible in the normal event model. Most current users of async_hooks will break if you naively layer async tasks within each other like that. You can't safely change the sequence of events that async_hooks outputs. Yet another reason why we're on a path to deprecating async_hooks as a public interface--it leaks internal behaviour way too much.

laverdet / node-fibers

async_hooks & node-fibers discussion #365