Nuxt 3: Workers - Githubissues

pi0 commented 5 years ago

Objectives

Isolate each nuxt dev run and reload when nuxt.config.js is being changed. So we don't have any memory leaks.
Being able to run modules (builder, etc) in a new V8 context or Cluster mode for both dev and prod
Supporting PM2 integration
Serverless runs
Removing @nuxt/cli dependency from nuxt-start for making it fast as 🚀

Description

The requirement is that for forking a new process (Cluster Worker) or running in a new context, we need an entry point, which should be fast, standalone and most important accepting arguments like cli commands. The simplest solution is to execute our cli script on each worker but it not only adds lot's of overhead but also makes problems for Cluster.

The proposed idea is creating embeddable standalone code that will be executed by normal CLI commands or directly by cluster runners. For passing arguments to them, we can pass a JSON argument as the only argument like this:

# Usage: nuxt-worker <worker> <rootDir> [overWrites]...

nuxt-worker server dir '{ "port": 8080 }'

nuxt-worker builder dir '{ "dev": false }'

This makes them small because we don't need to depend on any argv parser also powerful as we can pass any kind of args without extra configs.

Remarks

Currently each worker should read nuxt.config and mostly create a new Nuxt instance.
Nuxt CLI lives in Master Process

Workers

Workers are emebeded recepies to do certain tasks. They can hopefuly will replace all CLI logics as well as custom programatic usage for use cases like Lambda/Serverless.

Server

Creates nuxt instance
Starts listening

Builder

Creates nuxt instance
Creates builder instance
Starts build

Combinitions [TODO]

We combine workers in different use cases.

Start

Dev

pimlie commented 5 years ago

Do you have plans for this to support generate as well? As discussed on discord some time ago I have been thinking about the future of nuxt-generate-cluster for quite some time and I think the best course would be to work towards deprecating that package, more precisely to step away from 'just' supporting Cluster as it would be nice to have remote support as well. Or did you step away from full IPC support currently (as it seems the above now only mentions Cluster Worker)?

pi0 commented 5 years ago

@pimlie Sure. I'm not just started it. BTW would you please attach the link to your related works and hints for generating to this topic? That would be really helpful.

it would be nice to have remote support as well.

Indeed planned too. Not just Inter Process communication workers.

pimlie commented 5 years ago

My thoughts with regards to the future of nuxt-generate-cluster (ngc). These thoughts should be read as in an ideal world because looking at the download count of ngc I dont think the need for this is large enough and therefore not (yet) worth the full effort. But we could probably pick certain features.

Present situation

ngc uses the cluster package to support multi-threaded generating, mostly useful when you have a large number of dynamic routes. When running ngc it starts a master which is responsible for:

building the project
retrieving list of routes
start workers
distributing routes to workers
aggregate statistics/errors

Both master as workers use the same nuxt buildDir. This works because the master only builds at startup before any worker is started (or even any route is retrieved).

Messaging between master and workers is done through the IPC channel which the Cluster package provides.

Disadvantages

Currently ngc runs on a single host only, which limits the scale we can run at. I would like to be able to target a specific time in which generating all N routes should complete by increasing the number of hosts we run the generate workers on
There is no locking, you can run multiple ngc commands at the same time even though they possibly interfere which each other (eg when both ngc commands are also building the project)
The ngc command is 'single use' only, eg its not suited to run as a daemon so you have to run it as a cron job (this is also related to the above point of locking).

Wishlist

In general this is to run ngc daemonised and support infinite scaling. Fully support: $time_to_finish = $routes.length / $workers.count with the (almost) only limitation that $time_to_finish = $time_to_generate_single_heaviest_route + $some_little_overhead.

Needed/wished features

multiple projects per daemon
stateless master, master only distributes communication & jobs between workers
master starts required workers on-demand
a watchdog worker which aggregates statistics and holds list of routes to-generate
- in-memory only or persistent (eg backed by database)
master exposes a (web)api so it can be controlled (eg to re-build the project or to push additional routes to generate)
- webapi through an additional worker
- building is done in a separate worker
- when building all generate workers are stopped (eg outstanding generate jobs are cancelled)
- distributing the new build files to remote hosts should be implemented by the user, we should supply hooks for that at the very least and preferably implementations for common used file distribution methods (eg ftp, git?)
routes (to generate) can be pulled and pushed by using different kind of methods, eg:
- submitted directly through the (web)api to the master (think of a http put in a Laravel Save hook)
- pushed by a cli client command
- a worker which monitor a watch folder (eg for a single json file containing a list of routes or just all json files where each json file is one route and its payload)
- a worker which pulls records from a database
files for succesfully generated routes can be automatically pushed to remote
- eg file upload workers for ftp, ssh, git, webdav protocols

Main difficulty

Find a suitable messaging protocol, preferably one that understand a master, proxies and services. In ngc I implemented a (maybe poor-man's) message-broker as a proof of concept for this. Here both the master as a proxy could be understood as a single message-broker instance, with the exception that you can only have one master and multiple proxies. Eg if you have a master daemon running on a remote host for distributed generation then its only job would be to run as a communication end-point for the real master to start generate workers on-demand. The started generate workers would be proxies themselves as well which connect with the master themselves (at least atm). This is implemented by having each message-broker instance register with an alias, so eg a watchdog worker registers with the master under the alias watchdog and then a generate worker on some host only has to send a message to alias watchdog and the proxies in between make sure that messages ends up in the message-broker instance with the alias watchdog. Additionally it understands services so e.g. a watchdog message-broker instance can register a different method for when a worker reports it succesfully generated a route vs when it reports an error. When I said poor-man's implementation that is because atm:

it only supports one degree of separation between master and workers (cant connect proxies through proxies)
it only communicates over IPC channels
you can only have one message-broker instance per alias
there is no validation on before hand if a service really exists for the specified alias or error-reporting

There is probably a lot more to say about this but I am dried-up for the moment. Will continue in another comment if I think of other things :)

Atinux commented 3 years ago

Can we close this one @pi0 ?

pi0 commented 3 years ago

Clarification: After huge efforts on this solution we decided to discontinue because of architucture complexities it brings.

Alternative approach is rewriting nuxt3 server engine in a way that can be embeded as a serverless function or worker (for nuxt dev). A demo is public here

nuxt / rfcs

Nuxt 3: Workers #15

Meta