meteor / meteor-feature-requests

A tracker for Meteor issues that are requests for new functionality, not bugs.
Other
89 stars 3 forks source link

Umbrella for Fibers replacement/alternatives #392

Closed meggarr closed 3 years ago

meggarr commented 4 years ago

Since the async/await is already in ES8, shall the meteor replace the Fiber in its implementation with native async/await ?

gaurav- commented 4 years ago

I had shared some pretty strong opinion some time back on why fibers should be removed from Meteor. I'm no longer using Meteor actively but stumbled upon this issue so I thought this is a chance for my thoughts on the topic to not be lost after all :)

Sharing links of my posts on the forum:

and some others:

filipenevola commented 3 years ago

Hi, sorry for the long delay here.

I'm posting here to let all know that we are not ignoring this important topic.

We are in a study phase at the moment and that is why we are not communicating about it all the time.

We need first to understand all the affected parts of Meteor but we are going to solve this together.

Right now the good news is that Fibers is working with Node.js 14 and it means that we have maintenance mode already until Jan 2023 considering the general behavior of Fibers with Node.js based on releases page dates.

This DOES NOT mean that we are going to wait all this time to do something about Fibers conflicts. We know that one experimental module async_hooks and WebAssembly [edited] that are having a hard time with Fibers and they are already being used by third parties (like Datadog tracing). So this conflict between Fibers and Node.js native modules could grow in the next years.

TL;DR

If you want to get involved in early discussions join the #async channel in our Meteor Community Slack (invite here)

dr-dimitru commented 3 years ago

I had initial proposal back in September 2020 with idea of using node.js Workers to implement Fibers features:

I'd like to hear community thoughts on that, and would attempt to implement it and run fist tests

bengl commented 3 years ago

Hello :wave:! I just want to make a quick clarification on what's experimental in Node.js and what's not, in order to give a more complete picture.

rochdev commented 3 years ago

Right now the good news is that Fibers is working with Node.js 14 and it means that we have maintenance mode already until Jan 2023

Just wanted to point out that this isn't really correct. There is an increasing number of libraries that use async_hooks and WebAssembly, some of which are extremely popular and used by many other dependencies (for example bluebird, mongodb, etc). This means that as these libraries add async_hooks support, updating them or any of their dependants is a random risk of the application crashing or misbehaving at any point, regardless of the version of Node. A lot of them are just waiting for async_hooks to become stable before pulling the trigger, which should happen soon. A single one of them that is popular enough landing the change could cause quite the crisis.

Qard commented 3 years ago

Just FYI: domains, the inspector, perf_hooks, trace_events, and the unhandled rejection handler also all depend on correct async_hooks state, so you're very likely to crash the process using any of those with fibers. It's also possible to crash with fs watchers, http client and http2, depending on their use. All depend on async_hooks state correctness and will trigger the verifier in a way that would be incompatible with the corrupted state produced by how fibers messes with the id stack. (and completely disregards the resource stack!)

I am the primary maintainer of async_hooks currently, so I have a very good understanding of what its current state is. While it is considered experimental, it's not because of how it works internally--that has been fairly clearly defined and supported for several years now. It is purely experimental still because the original design of the public API exposed Node.js internals, which made it infeasible for marking as stable. As a result AsyncLocalStorage was introduced a year ago which is now being marked as stable. AsyncResource, which is used for maintaining context is also now being marked as stable--really it's been stable for years now but is only being marked as such now because AsyncLocalStorage provides a reason for it to be stable separately from the core async_hooks API.

The Node.js core team suggests using AsyncLocalStorage going forward for the ecosystem to manage request context, and is growing in popularity rapidly, so you're going to have a lot of problems very soon by relying on fibers for any longer. 😬

filipenevola commented 3 years ago

Hi @bengl @rochdev @Qard thank you for all clarifications and sorry about my wrong comment about WebAssembly, I already edited it.

@Qard thank you for being here and for the suggestion around AsyncLocalStorage.

About what @ dr-dimitru commented above, do you @Qard have any comments on this approach https://github.com/meteor/meteor-feature-requests/issues/392#issuecomment-808423937?

As you probably know async_hooks better than everybody I believe your input would be very helpful.

I see we could have two ways here:

Then my question is: do you believe b) is possible using Workers Threads + AsyncLocalStorage or any other combination? My naive answer would be no but I could be wrong.

jkasevits commented 3 years ago

My modest feedback on this issue, if I may, is that although Meteor has been very diligent in avoiding breaking changes in the past, it should not be a goal above everything else. The option b mentioned by @filipenevola above (trying to implement the same or similar concept as fibers, just with other resources), if at all possible, seems to direct Meteor away from the mainstream practice of async/await in favor of a solution that is bound to be not very widely used and rather exotic. Even if it technically works well, it would probably create unnessecary doubt around how future-proof Meteor is and also make it more difficult to bring new developers (that are already experienced in other node projects and js in general) into Meteor development.

On the other hand, the option a, if it is possible to get rid of fibers with mostly implementing async/await, then this would avoid such problems. Meteor would be that much closer to mainstream node and js in its internals and would enjoy the benefits of it. Since it is a one time leap, managing the migration to the new APIs could be solved to a large extent with just good documentation that would cover the steps necessary to update most apps. After all, the APIs for Meteor are not that large. It would be a pain. But a one time pain and probably worth it.

rochdev commented 3 years ago

I agree with @vooteles, there is no way to keep the current API without hacks that may or may not continue to work in the future as it relies on a behaviour that is not supported by either JavaScript or Node. Even if they did continue to work, it would be awkward to work with a concurrency model that is completely different than every other JS applications and libraries out there. Said that, a would be the only option going forward that would not cause this same discussion to happen again in the future.

sebakerckhof commented 3 years ago

Adopting async/await (and thus breaking BC) would also have the advantage of better isomorphism:

In fact, it's a stroke of luck that so much isomorphism was already achieved with the fibers code style.

Furthermore, it would allow us to benefit from having some methods async in the client. E.g. you could run your minimongo queries in a web worker if they are async (not saying this is a good idea, but at least it becomes possible).

The biggest problem with async/await is that once you introduce it somewhere deep down your code path, it bubbles all the way up to the top method. And there may even be third-party packages in between which will all need to be updated (e.g. collection hooks). However, as long as Fibers work in Node, we could keep Promise.await around as a viable way to prevent this and gradually update your application and deal with libraries that have not yet updated. (with deprecation notices).

Qard commented 3 years ago

IMO async/await is the only viable solution. Trying to hack together some other fibers-like thing over worker threads might buy you a little more time in the current state but it's not going to play nice with the large ecosystem of JavaScript code out there that works differently. It's just continuing to perpetuate a style that is counter to what the platform and ecosystem have already settled on which is gradually setting meteor further and further behind.

There's no avoiding that the change to async/await will have to happen eventually. Might as well just do it now so you can focus your energy on moving forward rather than burning all your energy just keeping the old way running a little longer. 🤷

CaptainN commented 3 years ago

For isomorphism, there are two main areas where that matters, regarding Mongo queries. In method/publication bodies, switching to async/await wouldn't be that big of a deal, IMHO. Maybe a bit tedious, but how to do that seems straight forward enough, from a code writers perspective.

The other place is SSR. Changing to async/await in for React code could be challenging, because React really doesn't like async code happening during its rendering. On the other hand, I'm fairly sure there's already problems with Fibers which would require all the same types of work arounds, so there's that too. I might have an idea how to fix those issues, if I can ever find the time...

Anyway, just some food for thought - it's not just methods and publications that end users will have to adjust for, it's also SSR. In React, the useTracker hook may be able to help paper over some of that.

Edit: Also, I agree with @sebakerckhof that if we are going to change the API, we should make server side async and client side async match, instead of the way it is now, where server side code uses a Fiber, and client side code uses a callback. That difference has actually caused quite a bit of head scratching on the forums and elsewhere over the years.

filipenevola commented 3 years ago

Yes, I agree that a) should be the option moving forward. I was just putting the options in the table. It will be nice to have Meteor following all the now async standards of Node.js in future versions.

We are studying the impact of Fibers migration in the boot.js. This is where the first Fiber is instantiate in the server startup.

Also, Node.js 16 seems to be not compatible with Fibers. This doesn't change anything as we were already planning the migration but this just confirms that the migration is necessary.

Just in case you are not aware of Node.js releases timeline, you can check details in this link but TL;DR Node.js 14 is going to be maintained until Jan 2023 and of course we want to have the Fibers migration completed ASAP but your apps even without migrations from your side should be working basically as today until Jan 2023 with coverage from Node.js maintainance. 😉

Studying async_hooks API and trying a few PoCs we can see that where Meteor was using Fibers we have very similar APIs to keep the context and keep our internal code not much different from the Fibers version.

Of course, the need to have async functions around the replacement of Fibers calls are going to require some API work but in general the approach should work.

yched commented 3 years ago

I'm very worried of the impact of turning all (actually truly synchronous) minimongo calls to async calls on the client just for the sake of isomorphism with the server side Mongo APIs if those had to become async instead of Fiber-based. That means a huge amount of client code rewrite, has a performance impact, and also has a fairly unclear impact on the react integration (as @CaptainN pointed, react and async brings a whole set of challenges) - again, all for no reason since minimongo is actially synchronous on the client.

filipenevola commented 3 years ago

hi @yched, that is the reason this Umbrella issue exists, so we can discuss our ideas and concerns before we make decisions 😉

What would be your idea? To have different APIs between client and server for Mongo functions?

zodern commented 3 years ago

Tracker expects the computation function to run synchronously in order to record all dependencies. Making apis used on the client within Tracker.autorun async will break that. I'm not sure how tracker could reliably work with asynchronous code. There is zones.js, but adding this would probably have many downsides.

yched commented 3 years ago

@filipenevola well I was kind of pondering maybe we could introduce new async APIs, and keep the existing sync APIs on the client for regular client-side reads (often within Tracker.autorun, which as @zodern points is also not really async-ready at the moment), and only force the client to use the async mongo APIs for shared isomorphic code in DDP methods.

But then looking at our own app codebase, the call stacks involving shared isomorphic code typically start in DDP method bodies, but often spread in helper functions or model classes, which are also used on the client outside of DDP method stubs, in Tracker calls, etc...So that "dual sync / async APIs" approach doesn't feel super confortable either...

As a side, not-super-constructive note, it's a bit ironic that Node is losing Fibers the moment PHP votes to get them :grimacing:

r00t3g commented 3 years ago

@yched @zodern but is it really so necessary to maintain isomorphism for Meteor.Mongo calls? Using Meteor.Mongo on the server side is severe limitation as it does not allow to use aggregation and transactions.

benjamn commented 3 years ago

Hello @filipenevola and friends!

Please have a look at this comment from @laverdet, as I think it provides a key hint about how to make fibers work even in Node 16, with the caveat that they may be (a lot) slower.

The performance consequences of using real threads may seem like a big drawback of the CORO_PTHREAD approach, but I believe it's also Meteor's ticket out of this predicament. The last time I dove into V8 internals, I came away with the understanding that OS-level multithreading is deeply supported by the V8 C++ API, and probably always will be. I don't mean JS worker threads, with their async postMessage API, but the kind of threads Node.js itself spawns to perform fs.readFileSync operations.

One of the things that was so great about node-fibers is that it managed to "trick" V8 into thinking there were multiple threads, by snooping on thread local storage, swapping out fabricated thread IDs, etc., all without actually spawning real threads. That trickery has always been at some risk of breaking in future versions of Node.js, and it now looks like that day will come in 2023, with the end of LTS support for Node.js 14. V8 support for "real" multithreading, however, isn't going anywhere.

Perhaps more importantly, I suspect making fibers slower may not be entirely bad, since it will create natural pressure for Meteor developers to remove fibers from hot code paths, in a pragmatic, results-focused way, ultimately leading to the removal of fibers from Meteor altogether, in favor of more traditional async APIs.

Here's a sketch of a possible plan:

  1. Find a way to build the Meteor dev_bundle with a version of dev_bundle/lib/node_modules/fibers that uses CORO_PTHREAD

    Note that this won't immediately work on Windows, but I imagine Windows support can be delayed until after we've validated the approach on Unix

  2. See how much of an impact this has on the Meteor build process, both initial (cold) builds and incremental rebuilds a. If we're lucky, it might not be such a huge slowdown, especially for rebuilds b. Meteor does its own fiber pooling/recycling, and a lot of the work done by the build system is cached in memory, and can be retrieved without using Fibers
  3. If the performance impact is negligible, skip to step 5
  4. If the performance impact is significant: a. Look for the places where Fiber-related overhead (thread context switching, new Fiber creation, etc.) is especially costly b. Find ways to address those hot code paths first, likely by switching to promises or async/await c. Go back to step 3
  5. Now that heavy use of Fibers is merely a performance problem, the Meteor team will have a strong (but not desperate) incentive to make sure any new APIs use promises rather than depending on Fibers, and Meteor applications can move away from the older synchronous-looking Fiber APIs at their own speed
  6. Completely eliminating Fiber usage from the Meteor CLI codebase will be challenging, but is ultimately feasible and worthwhile (for performance and maintainability) a. I believe this mostly boils down to finding all usages of Promise.await and Promise.prototype.await and rearranging the calling code to handle a normal Promise instead b. The hard part is finding all the callers, since JS is such a dynamic language c. When I was working full-time on Meteor, I did everything I could to hide the Fiber API behind promises, so that this removal would be slightly easier, whenever it eventually became necessary

🚨 Important note: I do not speak for the Meteor project or its current maintainers, but only from my best recollection of how Meteor worked when I was last working on it full-time. Take as much or as little of this advice as makes sense to you!

znewsham commented 3 years ago

Following on from @benjamn comment above - I looked into this a little.

Building node-fibers with pthread is trivial (just a declaration in binding.gyp). Unfortunately, it still fails to run under node 16 with (I believe) the same error message - the failing code does not involve the pthread specific code, rather code that is ran at all times, regardless of the implementation chosen (it's in the find_thread_id_key method).

Running fibers with the pthread build is approximately 3x slower for fiber switching - however, this corresponds to approximately 0.03ms vs 0.01ms per fiber switch - and in a production-like environment resulted in no noticeable slowdown - it is likely that areas that depend heavily on fiber switching (particularly creating new fibers - which is particularly slow in the case that old fibers are maintained) may see a noticeable performance decrease. However, one other side note - I was able to make the pthread version of fibers segfault where the non pthread version does not with the following code:

function currentFiber(aFiber) {
  for (let i = 0; i < 100; i++) {
    let fiber = new aFiber(() => {
      aFiber.yield();
    });
    const result = fiber.run();
  }
}
for (let i = 0; i < 100; i++) {
  currentFiber(Fiber);
}

this is likely an edge case, since none of the 10k created fibers ever run to completion.

Based on the above three points, I set about re-implementing fibers in a limited sense (for my purposes support of node 12-16 inclusive on linux/debian is sufficient). and successfully built a version, based loosely on the existing fibers pthread implementation, that exposes an API identical to node-fibers (literally swap out the fibers.node binary). Running this version against the node-fibers test suite results in passes in all cases, with the exception of cleanup.js - where unfortunately it hangs, likely I've missed some de-allocation somewhere - I'm still looking into this.

On a positive note, this implementation is about 5-10% faster than the "node-fibers pthread" implementation when switching fibers (not enough to make any difference to the near 300% slowdown of switching to pthreads), but slower when creating new fibers - this will improve when I implement pooling and doesnt segfault on the above sample. I also ran this in a production like env (the runtime, not the tool), and it seemed to work nicely.

I'm not sure if this addresses the overarching concern though - I'd say that if the only concern is performance, most people will not need to worry about switching to pthreads - however, it is unclear to me when reading the above discussion, exactly which features fail when combined with node-fibers, does anyone have a code snippet that either crashes or performs incorrectly when using node-fibers (beyond the crash of simply using node 16)?

laverdet commented 3 years ago

@znewsham this change is also needed

diff --git a/src/coroutine.cc b/src/coroutine.cc
index 027744f..e89abdc 100644
--- a/src/coroutine.cc
+++ b/src/coroutine.cc
@@ -148,7 +148,7 @@ void Coroutine::init(v8::Isolate* isolate) {
        isolate_key = v8::internal::Isolate::isolate_key_;
        thread_data_key = v8::internal::Isolate::per_isolate_thread_data_key_;
        thread_id_key = v8::internal::Isolate::thread_id_key_;
-#else
+#elif !defined(CORO_PTHREAD)
        pthread_t thread;
        pthread_create(&thread, NULL, find_thread_id_key, isolate);
        pthread_join(thread, NULL);
christian-bromann commented 3 years ago

@znewsham do you have a fork with the changes that made it work in Node v16? I am happy to battle test this in the WebdriverIO project.

znewsham commented 3 years ago

@znewsham this change is also needed

Thanks @laverdet I knew I'd be missing something simple - it still gives deprecation warnings on async_wrap, but it does seem to work now!

znewsham commented 3 years ago

@christian-bromann I've put the node16 version to node-fibers in my fork here https://github.com/znewsham/node-fibers/pull/1/files - probably I wont publish "my" version since the fork is a much better approach than a complete rebuild

ramezrafla commented 3 years ago

@znewsham .. thanks so much for this! We faced segfault in production when many users logged in simultaneously

As an FYI during build process we added (in programs/server) to include your node-fibers fork npm i https://github.com/znewsham/node-fibers/tarball/node16

Note: During build we rebuild from source (we are using ARM64), not sure if it's needed or not by others (cd programs/server && npm install --build-from-source)

We are testing it shortly to see if it solves segfault issues

znewsham commented 3 years ago

@ramezrafla I don't expect this will resolve your segfault issues, on fact I'd expect it to be worse since the non pthread version does not segfault on some code that the pthread version does. My node-fibers version just forces pthread to be used.

amsinha2 commented 3 years ago

@ramezrafla
I am seeing segfault on simultaneous multiple user login too. We are also using ARM64 and we don't see a segfault on AMD architecture, but can reliably reproduce segfaults on ARM.
Can also confirm that we are rebuilding from source on the npm command at build time. Looking forward to your test results!

ramezrafla commented 3 years ago

@amsinha2 I'll give it a shot .. but @znewsham doesn't seem to believe it would be better

@znewsham any reason why ARM64 has more segfaults? segfaults are memory issues, how come we are getting them? Maybe we need to allocate more mem per thread?

znewsham commented 3 years ago

Hmm. Unfortunately I don't know. I've seen it not just with insufficient allocation but with timing issues in multi thread environments. So it could go either way. But that's a great starting point. Would be easy enough to change the per fiber allocation

ramezrafla commented 3 years ago

@znewsham @amsinha2 I confirm that we hit sigsev much earlier with pthread ... grrr

Maybe the solution for us is to reduce dependency on fibers hence reducing the chance of sigsev I was thinking of using raw collection methods (which are likely the biggest users of fibers) -- this will require a lot of refactoring to put async / await everywhere in my code .. still thinking

znewsham commented 3 years ago

@ramezrafla I'm not convinced that reducing fiber switching will reduce the segfaults, though it's for sure worth trying. More likely, reducing places that you explicitly create fibers (which are generally very few) or places where fibers fail to run to completion will have more impact. The code snippet above that triggers the segfault creates loads of fibers that never complete. It's unlikely that these will come up in most meteor applications, but could happen if you don't resolve a promise somewhere or set up lots of observers on the server, or a lot of immortal subscriptions on the client

ramezrafla commented 3 years ago

Thanks @znewsham What I don't get, is how come we are having this issue in ARM64 but never faced it in Intel / AMD

filipenevola commented 3 years ago

Hi, in the process of validating the current conflicts between Fibers and Node.js async_hooks I was trying to reproduce the problem reported by @rochdev here with Fibers 5.

I believe I'm doing something wrong as I was expecting to get TypeError: Cannot read property 'Symbol(kResourceStore)' of undefined but I'm not getting any error.

My reproduction is here. This is a very small Node.js program simulating a Web server and using Fibers to store the userId.

Any clues?

filipenevola commented 3 years ago

This discussion is going to continue here

Now we are going to use GitHub Discussions in the main Meteor repository and there I just proposed a plan to change the Async execution in Meteor.