meteor / meteor-feature-requests

A tracker for Meteor issues that are requests for new functionality, not bugs.
Other
89 stars 3 forks source link

Multi Core support with Worker Threads #386

Open corporatepiyush opened 5 years ago

corporatepiyush commented 5 years ago

NodeJS v12 has the support for worker_threads with SharedArrayBuffer.

SharedArrayBuffer can be used for common data cache across the worker_threads to use all CPU cores of the machine instead of relying on Single Core CPU instance or Multiple docker instances/PODS always.

Ref Link :- https://nodejs.org/api/worker_threads.html#worker_threads_worker_threads

chrisbobbe commented 5 years ago

This kind of improvement would really help with scaleability, a widely known challenge with Meteor apps; I've opened another issue along similar lines: https://github.com/meteor/meteor/issues/10677

armellarcier commented 4 years ago

please reopen this... stale bot is a pain...

benjamn commented 4 years ago

Hopefully putting the issue in a milestone will keep it from being marked as stale…

nathanschwarz commented 3 years ago

Any news about this feature ?

kakadais commented 3 years ago

👍

filipenevola commented 3 years ago

Hi @nathanschwarz I believe the first discussion here would be which part of Meteor we should try this feature first.

Meteor is a huge code base so we would need to use Worker Threads in baby steps.

Maybe a good criteria would be which part could benefit most of this feature.

kakadais commented 3 years ago

@filipenevola Fully agree. I believe that Meteor is one of the best platform for Microservices and it would be good to go start it from the multi-core supply such as Worker thread. meteorhack:cluster package's strategy was good and working. But I think this is treated as a base support because of its important.

nathanschwarz commented 3 years ago

@kadasais meteorhack:cluster is based on the native node js cluster module I believe. It's basically forking (multi-core but single threaded) not thread workers. meteorhack:cluster also works on the client using web workers.

I've finished implementing multi-core on the server using cluster a few days ago: It's quite straight forward. The main downside right now is that we can't start Meteor as a serverless process. But you can still use a single port to communicate between the processes when forking. I've also made a serverless fork of Meteor using an environment flag to avoid the http server to start-up.

I'm using 2 types of multi core processes right now which are backed with a simple mongodb job queue :

I'm planning to make a third one for automated DB backups on an external ftp server.

@filipenevola obviously for me the best place to start would be on the server.

Multi core on the server would be "relatively simple" to build with the cluster module.

We could also leverage the native worker_threads module (with shared memory built in) but it's less straightforward because it would require to include the worker code into the build phase and replace the filepath in the master.

we would need new Worker('...worker_path.js') to become new Worker('...worker_path_after_build.js')

kakadais commented 3 years ago

@nathanschwarz Could you explain a bit more for 'Can't start Meteor as a serverless process'? I think the forking is enough to build a server using multi-core, and the key is working independently on specific service. Could you explain a bit more for your comment? What should we do or your suggestions things. Thanks-

nathanschwarz commented 3 years ago

Forking should be enough for a start.

Well depending on your usage and the implementation the workers dont need to be built with an http server, right now forking over a meteor process starts an http server each time (because meteor starts an http server). With my implementation I don't need the children to communicate together (so its basically a waste of ressources). that's what I ment by "serverless process".

Concerning the how we should build multi-core, I think the best way to go is by implementing a worker pool :

Then you can have 2 types of routine :

either :

Or :

The first is faster to implement but because of meteor heavy startup routine it will be slower and it will take more ressources.

nathanschwarz commented 3 years ago

@kakadais @filipenevola I made a working Worker Pool package here if you want to look at it. I can eventually put a PR together if you want. I still think the package needs some tweeks for modularity, logs, and tests anyway.

edit

you can now directly add the package from atmosphere: meteor add nschwarz:cluster

filipenevola commented 3 years ago

Hi @nathanschwarz, sorry for the delay, I was on vacation, I just read your code and it looks great.

What do you mean by a PR? Do you need any changes in the core?

Are you using your package in production already?

We could promote it in the Meteor community and start to have usage in production if that is not the case yet.

filipenevola commented 3 years ago

@filipenevola obviously for me the best place to start would be on the server.

I was thinking about Meteor core features and not between server and client.

What features of Meteor core runtime or builder could benefit most for Multi core support?

For example, I was talking with @renanccastro that maybe we could take advantage of that on tree-shaking build analysis to analyze the sub-trees in different cores.

nathanschwarz commented 3 years ago

@filipenevola, no worries about the delay !

Yes I'm using it in production, it's fully working, it still lacks a few minor tweeks :

but we could had these incrementally.

No, there's no change to add to the Core as it is now. I was talking about a minor change to the starting behavior on the core:

The main downside right now is that we can't start Meteor as a serverless process.

since this package doesn't requires the workers to communicate together we could pass a flag to skip the http server to avoid the waste of ressources (It's a few lines of code in the core, but it's not that important).

I can make an abstraction of the worker pool if you wish for the tree-shaking feature since it's bound to mongoDB via the TaskQueue right now.

filipenevola commented 3 years ago

@nathanschwarz great. Yes, these are nice to have but I believe your package is complete enough already. How could we promote it? A blog post in our official blog?

since this package doesn't requires the workers to communicate together we could pass a flag to skip the http server to avoid the waste of resources

I'm ok having this flag to avoid http server starting up, feel free to start a PR.

I can make an abstraction of the worker pool if you wish for the tree-shaking feature since it's bound to mongoDB via the TaskQueue right now.

I believe this is a good idea, we could have a mode option in your package. In-memory and persistent (your actual version with MongoDB). Is that your idea?

nathanschwarz commented 3 years ago

@filipenevola great !

A blog post would be nice 👍 .

I'm ok having this flag to avoid http server starting up, feel free to start a PR.

I'll work on a PR soon, I will tag you when it's done.

we could have a mode option in your package

I was thinking adding an optional inMemory: Boolean field to the TaskQueue.addTask prototype (defaulted to false). This way you can have both persistant and in-memory jobs with the same Cluster instance ! I'll work on it ASAP.

nathanschwarz commented 3 years ago

@filipenevola I've just updated nschwarz:cluster.

The in-memory jobs are working, and an inMemoryOnly field is settable in the Cluster options. It should do the trick ! Tell me if you encounter any issue.

update 1.1.0 is now available

nathanschwarz commented 3 years ago

@filipenevola, my bad no PR required.

I thought that the cluster module would provide a socket / filestream between the Master and the children for the IPC, but it's not the case, so the http server is still needed.

I've added eventListeners in 1.1.0, so you can handle the results in the master process if needed.

I'll try to find a solution to get a random free port number for the IPC to avoid potential conflicts between multiple apps running / building at the same time.

corporatepiyush commented 3 years ago

Shared Nothing Architecture at OS level if we are considering Process instance instead of Worker Thread.

  1. Scaling computation across CPU cores - Pin individual nodejs process launched through cluster module to individual CPU cores. It greatly reduces L1 an L2 cache misses and cost incurred at OS level due to multiprocessing and fair scheduling as any process when launched in LINUX is allocated all the CPU cores by default. Modern Xeon And AMD EPYC server CPU have great single threaded performance and large CPU die cache size. https://gist.github.com/corporatepiyush/55ecca29999cebbad2d58880cd376c90
  2. Scaling Network IO across CPU cores - Cluster module (or sticky-session module) provides a way to pin or stick a socket session for upstream and downstream to a particular nodejs process. https://stackoverflow.com/a/51418575/3282642
  3. cache memory redundancy vs centralized cache server trade-off - If you do above 2 things then it will definitely outperform Worker Thread Implementation with SharedBuffer if for a particular application Cache Hit rate is more client tuned than application tuned. What I mean this is that you are using cache to store more client session oriented data which avoid the trip to DB instead of saving whole lot of other things in cache for which you can't forsee good cache-hit rate. Of course there might be some sort of common data which is worth caching in all nodejs processes, now in this case we have to work out cache memory redundancy vs centralized cache server trade-off. Either replicate all across the nodejs OS level process if the amount is low say for example less than 64MB (totally depends upon your application and latency you want to achieve) or if its greater then may be consider any in-memory key-value based server for central storage like redis or memcache.
  4. Failover and GC - Individual NodeJS OS process also guarantees that if any exception happens in NodeJS process and thus leading to crash either due to external intrusion or a mistake done by programmer does not effectively takes multicore machine node out of the picture, individually affected and crashed processes can be restarted easily. Also each process will be sharing the RAM and hence GC activity will be more effective and lightweight than handling larger heap sizes at once if you are targeting to deploy on beefy spec machine in the cloud.
filipenevola commented 3 years ago

Hi @nathanschwarz, this is really great. Thank you.

Please ping me on Community Slack so we can work together in the blog post.