donmccurdy commented 4 years ago

Following up on a conversation in https://github.com/mrdoob/three.js/pull/18123, this is a proposal for a utility class for managing tasks distributed to Web Workers. Currently BasisTextureLoader, DRACOLoader, and OBJLoader2 implement redundant logic for that purpose.

A full-featured library for this purpose could easily get pretty complex. I'm hoping that we can write something fairly lightweight for internal use by threejs examples. A more robust version could probably be a standalone library, or part of ECSY, but that's beyond the scope I personally want to attempt.

Proposed API:

interface TaskManager {

  /** Returns true if support for the given task type is available. */
  supportsType( type: string ): boolean;

  /** Registers functionality for a new task type. */
  registerType( type: string, init: Function, execute: Function ): TaskManager;

  /** Provides initialization configuration and dependencies for all tasks of given type. */
  initType( type: string, config: object, transfer: Transferrable[] ): TaskManager;

  /** Queues a new task of the given type. Task will not execute until initialization completes. */
  addTask( type: string, cost: number, config: object, transfer: Transferrable[] ): Promise<any>;

  /** Destroys all workers and associated resources. */
  dispose(): TaskManager;

}

Use in DRACOLoader:

const DRACO_DECODE = 'draco/decode';

class DRACOLoader {

  constructor ( loadingManager: LoadingManager, taskManager: TaskManager ) {

    this.loadingManager = loadingManager || DefaultLoadingManager;
    this.taskManager = taskManager || DefaultTaskManager;

    if ( ! this.taskManager.supportsType( DRACO_DECODE ) ) {

      this.taskManager
        .registerType( DRACO_DECODE, taskInit, taskExecute )
        .initType( DRACO_DECODE, decoderConfig, decoderTransfers );

    }

  }

  load ( url, onLoad, onProgress, onError ) {

    const data = await fetch( url ).then( r => r.arrayBuffer() );
    const cost = data.byteLength;
    const config = { data, ... };

    this.taskManager
      .addTask( DRACO_DECODE, cost, config, [ data ] )
      .then( onLoad ).catch( onError );

  }

}

These two functions are passed to the TaskManager, and their function bodies are copied into the Web Worker (without surrounding context).

// Sets up state before a worker begins taking Draco tasks.
function taskInit ( config: object ): Promise<void> {

  // do async setup

  return Promise.resolve();

}

// Executes on a worker for each Draco task.
function taskExecute ( taskConfig: object ): Promise<any> {

  // do expensive processing

  return Promise.resolve( result );

}

The TaskManager class then takes on the responsibilities of:

Creating a configurable number of Web Workers.
Installing the dependencies (i.e. function bodies + transfers) for each supported task type.
Distributing tasks evenly across workers.
Disposing of workers if necessary.

donmccurdy commented 4 years ago

/cc @kaisalmen

kaisalmen commented 4 years ago

@donmccurdy regarding the question of ES6 module support in web workers:

Chromium 80 supports it without specifiying a command-line flag (https://bugs.chromium.org/p/chromium/issues/detail?id=680046)
When Firefox will support it is unclear (https://bugzilla.mozilla.org/show_bug.cgi?id=1558780)
I haven't checked Safari

Some preliminary thoughts:

TaskManager should cache a Worker of a specific type, so it can be directly re-used
In addition to supplying two functions we could think of performing registration with stringified code pieces or by pointing to a file relying on module imports (augmenting potential additional logic in worker is easier)
One worker run will produce one result. Should we think about intermediate results? I know this introduces more complexity, but could allow smaller chunks of results (like mesh streaming in OBJLoader2)
We should also assess which code and ideas from existing code could be re-used

Demo idea:

Asteroid field where workers create potentially indefinitely geometry along a flight path (not because this makes sense, but to test workload distribution , re-use and cleanup)

Usnul commented 4 years ago

I feel it's not a task "manager" as such, it's more like a task factory. Or some kind of execution framework. But it does seem nice to be able to unify this kind of functionality.

Personally I would go for something that allowed a bit more functionality, such as dependencies for instance. Things like being able to abort the task. Not necessarily right now, but to have a provision to be able to add those things in the future. As things stand - I don't see that being possible, except via config, but that is kinda messy.

Maybe have a minimalist abstraction "Task" with something like "promise" or "completion" property as a Promise. You can then tack things like dependencies and various task-specific functionalities onto it.

kaisalmen commented 4 years ago

Personally I would go for something that allowed a bit more functionality, such as dependencies for instance. Things like being able to abort the task.

With regard to dependencies and management of more complex code or execution functions support for jsm in workers will make things easier in the future. Especially with to Chrome 80+ (https://www.chromestatus.com/feature/5761300827209728) is supposed to support this without flags, but it is broken again in Chrome Dev 81 / Canary 81. I saw this working nicely on Windows already. Basically you can run the same code in the same way now as long as you obey the general limitations of workers. Porting existing code (like other loaders) to workers will be a lot easier, I think.

@donmccurdy and me agreed that we start development on a branch here and not in a separate repo. DRACOLoader and OBJLoader2 are first candidates for testing this new feature, but it is not limited to it of course.

donmccurdy commented 4 years ago

TaskManager should cache a Worker of a specific type, so it can be directly re-used

I was hoping TaskManager could be pre-initialized with a set of possible tasks, then spin up N workers each capable of doing any of them. This would make it easier to balance work across many task types, within the browser's worker count limits. It may be a little tricky to set up.

With regard to dependencies and management of more complex code or execution functions support for jsm in workers will make things easier in the future

I hope so, but I worry it will be hard to make this compatible with bundlers. Maybe we can start simply and build up dependency support when needed. In any case there will need to be some way to ship (1) the core JS logic of the task, and (2) WASM dependencies. It won't be hard to allow ES module imports, but loaders provided by threejs may need to avoid using that feature, for portability.

DRACOLoader and OBJLoader2 are first candidates for testing this new feature

I'm also in the process of writing KTX2Loader (https://github.com/mrdoob/three.js/pull/18490), which will need it.

Maybe have a minimalist abstraction "Task" with something like "promise" or "completion" property as a Promise

I like this. 👍

I'd also like to allow the possibility of setting .maxWorkers = 0 to run tasks in the main thread, if we can. This will make it easier for users on NodeJS, or with a Content Security Policy that prevents us from creating workers, to still use these loaders (more slowly).

I'll try to start a branch fairly soon, based mostly on a bit more abstraction around what DRACOLoader does today...

kaisalmen commented 4 years ago

I hope so, but I worry it will be hard to make this compatible with bundlers. Maybe we can start simply and build up dependency support when needed.

Yes, jsm in workers should not be requirement. I consider it an extra option.

I'd also like to allow the possibility of setting .maxWorkers = 0 to run tasks in the main thread, if we can.

Yes, fallback option is a good idea.

I am in the process of establishing a three.js-tuesday-evening, so my response time/progress reporting becomes a little more predictable.

kaisalmen commented 4 years ago

Life-sign from me. I understood why I broke obj2 jsm worker exec. PR is now available: #18886. I want to have both code path (legacy worker and jsm worker) supported here if possible from the beginning and I thought there was a bigger issue, but there is none. Relative locations must be expressed correctly. 😳

kaisalmen commented 4 years ago

Hey, I finally started working on the implementation. What do you think about adding a third function to register that handles communication in the worker like calling init and execute and afterwards messaging it back:

registerType( type: string, init: Function, execute: Function, manageCom: Function ): TaskManager;

This way you can separate the execution logic from the worker communication needs. In OBJLoader2Parallel the com-implementation is very heavy, but the obj parser is completely free of worker specific code. In DRACOLoader the worker feedback code and parsing logic is mixed. This could be the middle way.

First goal is a simple prototype example that capture complete tour through all functions of TaskManager.

donmccurdy commented 4 years ago

I did some initial work a bit ago but haven't looked at it in a month or so, see https://github.com/mrdoob/three.js/compare/dev...donmccurdy:feat-taskmanager. It works with a simple test case (see the unit test) and with DRACOLoader so far, based on init and execute functions attached to the task object.

The word "manage" doesn't mean much to me... when is manageCom called, with what? If OBJ parsing requires sending incremental progress updates, perhaps the execute function should have access to something that can send those?

kaisalmen commented 4 years ago

The word "manage" doesn't mean much to me... when is manageCom called, with what?

Sorry, this was too unspecific and the wrong term. My idea is to encapsulate all Worker <-> Main based communication into this "comRouter" function (optionally). The OBJLoader2Parser has callback functions (mesh ready, parse done) which allows to use the same unaltered code in the worker and outside the worker. The "comRouter" delegates the "init" and "execute" message to the functions and transport (intermediate) results back. But, this is optional and should not contradict your initial proposal. Hope this made my idea clearer.

Edit: Just had a look at your code. Your TaskWorker is handling the communication (what I call "comRouter"), correct? You glue all the code of the different tasks into one big piece that becomes the Worker.

kaisalmen commented 4 years ago

Am able to make all eight logical cores 100% busy: One simple worker (10^8 additions in for loop), 8 instance, 1000 added tasks . I have not modified any loader code, because I wanted to get the concept straight, first. This is still WIP. Sorry, things move very slow, but spare time is more rare than usual for me these days.

kaisalmen commented 4 years ago

There is finally something to look at: https://raw.githack.com/kaisalmen/three.js/TaskManagerProto/examples/webgl_loader_taskmanager.html

All basic functionality of TaskManager is implemented. I also added dependency loading for non-jsm workers. In the above example eight workers of the same kind are created, They contain three.js, on exec produce spheres with randomly transformed vertices. Buffers are transported to main. There meshes are created from buffers and put to scene. 25000 executions are triggered. Only the last 500 objects are kept, previous ones are deleted, but there is still a memory leak. Beware, it eats 4GB of memory over time during execution.

Of course, This is still work in progress...

kaisalmen commented 4 years ago

Here we are. This is what should come next. Feedback requested: 😄

Next step is to verify that in initType and addTask config transferables are properly processed. Costs need to be covered as well.
I will setup different tasks (Worker that don't use three.js and jsm workers) and all workers should generally do more heavy computations but must be easy to understand (adding and removing objects in the above demo is costly, therefore the workers don't fully occupy the CPU)
Basically all functionality should be covered with small easy to understand workers (see previous point) that cover all functionality before changing code of existing loaders. This could/should be transformed into unit tests. I forced myself to do it this way to minimize the impact of existing loaders on the implementation. E.g. the way OBJLoader2 works heavily enforced the design of the worker init and execution in the past.
The basic api design of @donmccurdy works well, I think. We have to discuss at some point if the Implementation is ok.
A final example could potentially indefinitely spawn workers of different types, load different, possible complex meshes with then adapted loaders and clean-up scene graph with a sliding window of kept message (this time without leaking memory). This should help identify problems, verify loads are nicely distributed and consume all assigned resources.

kaisalmen commented 4 years ago

PR is not yet ready. I needed to clean util functions and update jsdoc which took longer than expected. Will get there soon (status; https://github.com/kaisalmen/three.js/tree/TaskManagerProto) and let you know...

kaisalmen commented 4 years ago

Legacy + library dependency embedding, jsm workers, plus to-main fallback with legacy workers is now all working. I fixed and enhanced more things than I thought I would. TaskManager doc is not fully complete. https://raw.githack.com/kaisalmen/three.js/TaskManagerProto/examples/webgl_loader_taskmanager.html (The rendering getting stuck is because of 250 execution fake worker on main).

trusktr commented 4 years ago

or part of ECSY,

Just curious, how does worker management relate to ECSY? I thought that was a component-entity system, and nothing about workers?

donmccurdy commented 4 years ago

An Entity-Component System is (among other things) a way of structuring and scheduling work: Systems define the order in which their logic runs. Events no longer fire arbitrarily, but instead are queued and processed in a priority order.

That sort of opinionated structure could (in my opinion) make it easier to move work into other threads. See Data Structures for Entity Systems. That's not to say it's easy, or that ECSY should necessarily do this. But it's possible, and ECSY wouldn't be the first ECS to go that direction.

But to clarify — I'm not interested in building anything that complex in the TaskManager proposal here. Just a minimal task runner that supports dependencies, and little else.

gkjohnson commented 3 years ago

Since #11746 was closed I'll repeat some of my findings and opinions from this comment here.

For extensions that are always run in workers, such as Basis and DRACO, a worker task manager system might be beneficial. But for enabling all model loaders to asynchronously run in WebWorkers I recommend updating toJSON and ObjectLoader to support a modified serialization format using the original array buffers which would enable fast serialization, WebWorker transfer, and deserialization. With the OBJLoader this technique gets frame stall time down from 200ms to 5ms. I think OBJ is the most intensive format to parse but any of the loaders should reduce down to around that stall time using Workers like this. It also gives people the most flexibility in terms of what can be done in a worker. A quickly transferrable serialization format means people can load or generate as many models as they want or however they want and use toJSON to reliable transfer it back to the main thread.

It was requested I make a PR with some of the necessary toJSON and ObjectLoader changes for this for discussion which I've done in #21035 so it would be nice to get some feedback from maintainers on the direction.

Mugen87 commented 3 years ago

But for enabling all model loaders to asynchronously run in WebWorkers

I'm not sure this is required. Only because you can run everything in a worker does not mean you should do it. The idea of having TaskManager to selectively implement computational intensive tasks in workers seems more appropriate to me.

gkjohnson commented 3 years ago

Only because you can run everything in a worker does not mean you should do it. The idea of having TaskManager to selectively implement computational intensive tasks in workers seems more appropriate to me.

I'm definitely not suggesting it be done just "because you can". I'm suggesting it because it seems like a much simpler, general solution that's been shown to work to a problem that could otherwise become complicated and difficult to maintain. If you don't feel it's the right approach then it would be good to provide some reasoning so it can be discussed. Nearly all loaders tend to stall in one way or another and yes you could implement all individual computation as a worker piece meal per loader but I think that's pretty clearly a lot more work to maintain than having a generalized transferrable serialized format. Not to mention there are use cases in generating complex scenes and geometry in a worker and easily transferring it back to the main thread. Yes the user could write their own format for serializing the data but why? It would just wind up being the same thing we're talking about here.

There's potentially some benefit to parallelizing multiple tasks within a single loader but I know for my use cases I would prefer to write a pool of composite workers that can load multiple file formats to avoid the overhead of creating and disposing of workers which incurs a not so insignificant amount of overhead.

donmccurdy commented 3 years ago

Yes the user could write their own format for serializing the data but why?

glTF exists for exactly this reason. I'm OK with the changes in https://github.com/mrdoob/three.js/pull/21035, but I don't think we should spend a lot of energy trying to get all loaders to run in web workers. We should encourage people to use formats that are appropriate for runtime use, i.e. not OBJ.

Mugen87 commented 3 years ago

To me, this is a matter about application architecture and what type of patters the project should promote. Like I said before, I believe that only certain parts of computational work needs to reside in workers. And TaskManager is an ideal solution to define and execute such logic. I also think that webgl_worker_offscreencanvas is something the project should promote. Meaning running the 3D logic in a worker and using OffscreenCanvas for rendering.

I understand that there are more complex scenarios that require more features from the engine. But frankly I don't vote to give such use cases a high priority regarding the project's philosophy.

Since the goal of three.js is to provide a lean and simple 3D rendering engine, I always have a hard time accommodate complex/very specific use cases. I believe respective APIs generate a considerable amount of complexity to the engine and tend to make the project harder to maintain. That's the reason why I also voted not to merge #15611 at that time. I still feel this API is too complex for most users. And I see it similar for #21035 and its intended usage.

kaisalmen commented 3 years ago

19650 introduces `TransportUtils` along with `WorkerTaskManager`. This could be seen as a mitigation to this problem, I hope:

TransportUtils provides a stack of utilities (DataTransport, GeometryTransport, MeshTransport, MaterialsTransport) allowing to transport simple ArrayBuffers or more complex objects bi-directionally between main and workers (https://github.com/mrdoob/three.js/pull/19650#issuecomment-775766977 summarizes the concept pretty good).

WorkerTaskManager is independent of these "transport" utils, btw. They make all pieces Transferable that can be and all other object content is passed as is directly (toJSON() is only used for Material and Textures are the only things not supported). On the other side they re-construct the objects. These utils are applied in a wrapper around an unchanged OBJLoader, OBJLoader2 and other example workers (see applied in example of #19650 or external repo). Changes to BufferAttribute/BufferGeometry from #21035 could simplify the code.

Another thing underestimated with workers, I think: The code/dependencies surrounding what is computed in the worker. Here, WorkerTaskManager makes creation of Worker code easier. It can load dependencies for standard workers and spit out the complete code as standard worker solving the relative path problems with nasty importScripts. With module workers this problem does not exist.

gkjohnson commented 3 years ago

I'm OK with the changes in #21035, but I don't think we should spend a lot of energy trying to get all loaders to run in web workers

I agree which is why I think a smaller one-feature-enables-all-loaders approach is the right way to address the problem discussed other issue.

glTF exists for exactly this reason. ... We should encourage people to use formats that are appropriate for runtime use, i.e. not OBJ.

I also agree but for my use cases I don't always have control over the models my users generate or have access to and would like to be able to load the model into my tool without asking the user to jump through hoops for conversion. I get people loading OBJs, VRML, STL, PLY, etc.

And if the angle of this issue is not to enable non blocking loaders aside from GLTF then should #11746 have been closed for this one? That was about general non blocking asset loaders which is why I aimed for a one size fits all solution.

TaskManager is an ideal solution to define and execute such logic. I also think that webgl_worker_offscreencanvas is something the project should promote. Meaning running the 3D logic in a worker and using OffscreenCanvas for rendering.

I don't think the concept of a task manager conflicts here then -- it could be a nice addition, but the ability to easily tranfser materials and geometry without having to rewrite all the serialization and deserialization methods is valuable. The concept of TransportUtils sounds interesting but I can't help but feel like it's more complicated than using the existing serialization format while retaining transferrable types (including ImageBitmap and OffscreenCanvas for textures).

I understand that there are more complex scenarios that require more features from the engine. But frankly I don't vote to give such use cases a high priority regarding the project's philosophy.

I definitely understand and agree to an extent. If I can build something around three.js and maintain it externally I do that. I try to keep my use-case specific features outside of the library but there are times when adding a small feature or hook to three is really enabling. I just ask that it not preclude being able to take advantage of basic features and provide small but enabling hooks for outside applications while maintaining a simple API and I think for the most part it already does that. To me three.js has struck a fantastic balance of this for years and is a platform that's easy to jump into and that can grow with your experience level. I'd love to see that maintained.

The code/dependencies surrounding what is computed in the worker. Here, WorkerTaskManager makes creation of Worker code easier. It can load dependencies for standard workers and spit out the complete code as standard worker solving the relative path problems with nasty importScripts.

Yeah Workers are a huge pain at the moment. I have hope in the coming years that they will become more usable with upcoming standards and several bundlers harmonizing on a browser-compatible worker syntax.

kaisalmen commented 3 years ago

The concept of TransportUtils sounds interesting but I can't help but feel like it's more complicated than using the existing serialization format while retaining transferrable types (including ImageBitmap and OffscreenCanvas for textures).

I agree with you. If you can rely on "serialize runtime state of a scene object and use Transferables wherever applicable"-function, then TransportUtils will collapse to meta-description plus serialized payload and some utility functions for packaging/reconstruction. However a solution looks in the end, having a handy utility for transferring arbitrary scene objects between worker<->main will be beneficial for developers now and in the future, I think.

Yeah Workers are a huge pain at the moment. I have hope in the coming years that they will become more usable with upcoming standards and several bundlers harmonizing on a browser-compatible worker syntax.

Module workers already remove many of the pain points. But you are stuck with Chrome.

donmccurdy commented 2 years ago

Between @kaisalmen's work in https://github.com/kaisalmen/wtd, and the small WorkerPool.js utility in the three.js repository, I believe this issue can be resolved. 🎉

kaisalmen commented 8 months ago

Hi there. long time no see. 🙂 I know it is an old and closed issue, but maybe the easiest way to reach people interested in the topic.

I released wtd-core@3.0.0 and wtd-three-ext@3.0.0 a couple of days ago (wtd repo). The core library evolved to be more generally usable outside the context of three,js. I also incorporated utilities based on the three.js optimization manual that allows to use OffscreeenCanvas with configurable/adjustable event delegation.

The following demos could be interesting:

Inter-worker communication with MessageChannels: https://kaisalmen.github.io/wtd/workerCom.html
WorkerTaskDirector: Potentially Infinite Execution (not new, but still nice): https://kaisalmen.github.io/wtd/potentially_infinite.html
A new OBJLoader2 example uses this new functionality as well: https://kaisalmen.github.io/WWOBJLoader/obj2_basic_offscreen.html

Hope you don't mind this advertisement. 😉 Maybe, this can be useful to someone. Feedback is welcome.

Keep up the good work! 👋

mrdoob / three.js

TaskManager: Proposed worker management class #18234

19650 introduces `TransportUtils` along with `WorkerTaskManager`. This could be seen as a mitigation to this problem, I hope:

mrdoob / three.js

TaskManager: Proposed worker management class #18234

19650 introduces TransportUtils along with WorkerTaskManager. This could be seen as a mitigation to this problem, I hope:

19650 introduces `TransportUtils` along with `WorkerTaskManager`. This could be seen as a mitigation to this problem, I hope: