tc39 / proposal-built-in-modules

BSD 2-Clause "Simplified" License
892 stars 25 forks source link

An actor model #33

Closed StoneCypher closed 5 years ago

StoneCypher commented 5 years ago

I'll speak to what I feel to be the real elephant in the room for me for server development these days.

We need convenient multi-process, and web workers just don't cut it.

I know, everyone loves promises and async and generators and so on. They'll point out that that's parallelism of a form.

It's still happening on one core, though.


Old man onion-on-belt story mode.

C++ has a difference between "containers" and "datastructures." Containers are about how you use them; datastructures are about how they're implemented. When you say something like "a map could be a red-black tree or an AVL tree," map is the container, red-black tree is the datastructure, and suddenly heap becomes intensely confusing in conversation (especially once you start talking memory management.)

Lots of people get bent out of shape about how it would be implemented, what the goals and limitations are, blah blah. By metaphor, I think to the C++ crowd those are datastructure topics, and we need by metaphor to have a container discussion.

Fundamentally, it doesn't matter how they're implemented. At least, not at the language level.


Look to erlang, f#, d, scala, akka, cloud haskell, CAF, and so on.

Each has a programming language level model for parallelism and multiple fundamentally different backend implementations with trade-offs.

Why?

Because the language was able to offer to the programmer the tool, without forcing the implementation's hand.

Look to C++. The specification for map<> gives extremely detailed performance guarantees, but still, many STL vendors use many different implementation backends.

These things can be done in a way that doesn't nail down how they work, but does nail down how they're used, and how they're kept secure.

Sometimes - oftentimes - that's best, and easiest to get finished.


An actual developer-friendly implementation doesn't even need processes or threads underneath (though that is the sensible choice, and the intended goal.) This can easily be implemented in a single thread as what amounts to cooperative or pre-emptive multitasking, while waiting for a deeper implementation.

erlang's parallelism model is the easiest to understand and work with that I have ever seen, bar none, full stop.

Here's a quick, bastardized JS version:

  1. Introduce a new type, the pid, or 'process id', which represents a process
  2. A function self() that produces the current pid
  3. A way to stall. This could be yield. Erlang generally handles this by blocking on receive, described below.
  4. spawn({ initial_function, args, host_node }) -> pid
    1. those options:
      1. initial_function is the main() of the process
      2. you guessed what args are
      3. host_node is for a multi-node vm, and would be unsupported on most impls
    2. you could add:
      1. links to notify when the process threw, bidirectionally
      2. links to notify when the process threw, monodirectionally
      3. registered_name for a global lookup
    3. when initial_function terminates, the process does too
    4. the main process is a pid now
    5. you send messages between pids to communicate between them
      1. messages are immutable plain data. no functions, no UDTs, no nonsense, just JSON and maps and sets
      2. no, processes may not pass references between one another. an exception is made for PIDs.
  5. you could introduce a new function to send to a pid, but erlang uses an operator !, and i like that, so this example will use that
  6. you introduce a new operator receive to receive a message, which has a very case-like structure, pattern matching against what it receives.
    1. a well-written receive will discard messages it can't match, but they don't imply that because there are tricky things you might be doing instead, in the way that break isn't implied in case because you might be doing duff's device. even though you aren't, it's just a bug
    2. operator receive needs to block for free, because that's how almost all threads are going to do what a generator person thinks of as yielding
  7. processes must be kept ridiculously lightweight (think what java calls green threads.) if we're going to throw a lot of these around, the lower the cost, the better. erlang gets a process in under a K with zero idle cost
  8. if a process faults, let it fault. move on.
    image

craptacular half-baked example time:

const targets = ['www.ibm.com', 'www.microsoft.com', 'goatse.cx'],

      fetch_p = ({ url, mgr }) => 
        // this will end up happening in some other pid
        fetch(url).then(
          data => mgr ! { site: url, result: data.json() }
        ),

      pids = targets.map(
        // this is what makes and sends-to the other pid
        addr => spawn(fetch_p, [{ url: addr, mgr: self() }])
      );

const recv_left = targets.length;
let   results   = {};

while (recv_left) {

  receive {

    case { url, result }: 
      results[url] = result;
      --recv_left;
      break;

  } after 5000 {
    throw new ReceiveException("Timed out");
  };

}

not sure but i think that this is enough to get browsers and node on board despite their wildly different multiprocessing needs, and is also enough to get erlang going through 20 years of being the best parallelism in the game

i think it's a decent place to start looking for answers

this also really heavily says "hey guys and gals, remember tail recursion?"


I suppose there's also a decent argument for doing this by pushing a promise, but that involves so many changes to references and how a closure would work that I don't even slightly want to think about it

littledan commented 5 years ago

There's ongoing work in the web space to improve ergonomics for using workers; see @developit's recent talk at the Chrome Dev Summit.

Let's follow up on feature idea brainstorming in #16.

StoneCypher commented 5 years ago

I really wish you'd stop closing my tickets, dan

StoneCypher commented 5 years ago

I would like this ticket reopened. I stand by the suggestion. It is correct, appropriate, and should be considered. It does not belong on #16.

brylie commented 5 years ago

It is a bit difficult to have a focused discussion in #16. At some point, each idea of merit deserves a focused issue, where the idea can be further explored. I believe it is a bit premature to close this issue and similar ideas before there is a significant comment period.

trotyl commented 5 years ago

@brylie Please have a look at the README:

The goal of this proposal is to define a mechanism for providing a more extensive standard library in JavaScript than is currently available.

The library itself is tangential to this proposal, and would be built and expanded upon in later efforts. Such a library would only cover features which would be useful in JavaScript in general, not things which are tied to the web platform.

This repo is for "How could a standard library be integrated in JavaScript", the contents of the standard library are all their own proposals, like https://github.com/WICG/kv-storage, the discussion here is only to help determine what the shape could be in the future.