tc39 / proposal-ecmascript-sharedmem

Shared memory and atomics for ECMAscript
Mozilla Public License 2.0
374 stars 32 forks source link

Agents, vats, continents, oh my: Rephrase agents properly in terms of job queues and realms #27

Closed lars-t-hansen closed 8 years ago

lars-t-hansen commented 8 years ago

Currently this is specified only in a terminology entry (since the creation of an agent is outside the spec).

However, an agent has its own initial job queue and initial realm, cf E262 8.5. This will warrant more than a terminology entry, but it will probably be little more than a variation on the existing 8.5, as the agent really is an independent ECMAScript instance.

jfbastien commented 8 years ago

I realize this issue is for ECMA-isms, but I'd like to make sure it also matches well with C++'s upcoming execution agents.

lars-t-hansen commented 8 years ago

@jfbastien, that'll be a separate bug relating to forward-progress guarantees. I have a work item to go thru the draft spec and turn every "(Spec draft note)" into a bug; the forward-progress guarantee is such a note (and links to the C++ memo, IIRC).

lars-t-hansen commented 8 years ago

The forward-progress guarantee is tracked by Issue #28.

lars-t-hansen commented 8 years ago

At the January 2016 meeting, Mark Miller brought up the issue that the existing term "vat" may correspond, more or less, to what I've called an agent. Should investigate.

lars-t-hansen commented 8 years ago

It turns out @domenic is proposing another term, "continents", for this, and he's trying to nail this down for the HTML (DOM?) spec as well, which is a lucky coincidence.

(Visibility: @erights.)

I'm going to make this a blocker for Stage 3 since Stage 3 will need more than what's in the current draft, though with the understanding that Stage 3 does not need a spec for vats/continents to be completely flushed out.

domenic commented 8 years ago

My previous work is here: https://github.com/tc39/ecma262/pull/226. It is largely a matter of definition and clarification, that I was intending to reference from HTML to acknowledge the "paradox" that the ES spec only mentions a single job queue, but the HTML spec needs a job queue per event loop. I imagine your work will flesh out exactly what a continent means in a bit more detail, in order to create the appropriate guarantees. Maybe it would even become a spec-level object, which gets passed around to various abstract operations.

I don't care much about the term; continents comes from Allen's es-discuss post and I thought it was a clever way to extend the "realm" metaphor. I agree it is equivalent to vat and agent.

erights commented 8 years ago

I don't care much about the term

Hi Domenic, glad to hear that.

"Vat" is short and is what the concept is called in several papers about the communicating event loops concurrency model, as well as several previous systems. Whatever we call it, we will end up with it as part of method names. blahVatBlah vs blahContinentBlah. Don't underestimate the value of brevity.

lars-t-hansen commented 8 years ago

I care little about the terminology aspect, but the word "Agent", though not perfect, has one property that I find desirable: the word suggests an actor (a runner of jobs), not a container (of whatever). I might have chosen "Actor", but that word is bogged down with 40 years of other meaning.

jfbastien commented 8 years ago

I also like "Agent", considering it's what current C++ proposals use :)

lars-t-hansen commented 8 years ago

@erights: Mark, can you send links to the papers you mentioned in your earlier comment? Thanks.

lars-t-hansen commented 8 years ago

@allenwb @domenic @erights @annevk

I've extracted what I had written about agents in the shared memory spec, cleaned it up and fleshed it out significantly, and placed it in a separate companion spec, formatted version here. The source is on the "agents" branch of tc39/ecmascript_sharedmem, in tc39/agents.html.

I could use some feedback on whether this spec is going in an acceptable direction, and beyond that I need feedback on all the details of course. Comment here or in PRs on that file as you wish.

(I'll contact you individually in a few days if all I hear is crickets :)

erights commented 8 years ago

@lars-t-hansen

Mark, can you send links to the papers you mentioned in your earlier comment? Thanks.

me:

"Vat" is [...] what the concept is called in several papers about the communicating event loops concurrency model, as well as several previous systems.

And there is at least one project whose name is explained to be a play on "vat" Tanks: multiple reader, single writer actors

Some of these may be multiple papers about the same system. Some of them discuss "vat" only in related work. If you have trouble getting any of them (paywalls, whatever), please let me know. I may be able to help.

erights commented 8 years ago

@lars-t-hansen "Agent" on the other hand, is also used a lot in the literature, but to mean something else:

https://scholar.google.com/scholar?q=%22agent+oriented+programming%22&btnG=&hl=en&as_sdt=0%2C5&as_vis=1

Two of the papers in the previous comment mention both "agent" and "vat" as contrasting concepts. What we mean here is what they mean by "vat" as opposed to what they mean by "agent".

erights commented 8 years ago

This thread got me curious, so I searched for "vat" and "agent" on LtU:

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=site:lambda-the-ultimate.org+vat

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=site:lambda-the-ultimate.org+agent

domenic commented 8 years ago

This is looking pretty good. I will avoid the bikeshed discussion here. (I wish @erights would respect this too, per your note at the top "I suggest we bikeshed a name later.")

There are some editorial things, mostly around the idea of "attributes", that probably need to be phrased differently in a more ES-ey way. I am not too concerned about that (and don't immediately have good suggestions on how to change them).

It is embedding-dependent whether ECMAScript code can directly access or observe an agent. If an agent has a value representation within an ECMAScript program it is as a host object value.

I would omit these and the similar sentences for agent clusters.

I think the section on inter-agent communication is a bit out of place. It's mostly talking about unspecified things that other specs could do, and what non-requirements those other specs should impose. Maybe it would make more sense as a series of NOTEs sprinkled throughout other relevant sections?

The notes on external suspension of agents and how this impacts shared/service workers is very interesting. I might inline this section into the "Agent clusters" section though since it seems mostly to be about the "atomic" nature of agent clusters.

Regarding "External termination of agents", see https://github.com/tc39/ecma262/issues/401.

There appears to be no way at present to directly determine whether a Worker has terminated.

If I recall this is somewhat intentional as otherwise GC is exposed... I found https://www.w3.org/Bugs/Public/show_bug.cgi?id=28813 but I think there are a lot more discussions somewhere that I can't find right now. @annevk?

annevk commented 8 years ago

I found https://lists.w3.org/Archives/Public/public-whatwg-archive/2013Oct/thread.html#msg3. Will link that from the bug you mentioned too.

annevk commented 8 years ago

I read through https://axis-of-eval.org/shmem/agents-formatted.html (lovely domain).

Agent mapping: I think it would be clearer to state that an Agent maps to the "event loop" concept of HTML. Ideally an Agent would map to a "unit of related similar-origin browsing contexts", but I don't think any browser does exactly that as it would mean different process cross-origin iframes, which even Chrome's cross-origin iframe project cannot quite accomplish in all cases due to memory concerns.

As for the worker changes, I think we're generally happy to make changes to HTML, but I would like to see @kinu and other Chrome folks weigh in to make sure there's general agreement on them. As well as @smaug---- and probably @sicking on the Firefox side.

lars-t-hansen commented 8 years ago

@annevk, talking to Kyle Huey (a while back, now) I got the strong sense that some worker semantic changes I proposed may not fly, notably the requirement that the creator need not return to its event loop before a worker is "actually" created. But I can perhaps live with that, it's only a minor hardship so long as the main-thread script can't block (and that's probably going to be how it turns out). It's worse if creating a nested worker requires the creating worker to return to its event loop, though. But how to spec a mess like that?

annevk commented 8 years ago

Well, the worker constructor knows the environment it is created in, so it can certainly change its behavior based on that, if folks, including @khuey, are willing to do that.

khuey commented 8 years ago

I'm not really opposed to the differing behavior depending on the environment when it makes sense ... but I think that more thought about what it means to have a script that doesn't yield to the event loop, particularly with respect to the run to completion model, and which APIs work, which don't, etc is needed. Probably at the TAG level.

khuey commented 8 years ago

And to be clear, the issue in Gecko is that network loads do not advance if the main thread is not processing events. I fear we're going to get very deep into implementation details if we try to spec this for the main thread. For dedicated worker threads it will likely be simpler.

annevk commented 8 years ago

@lars-t-hansen @khuey perhaps what we need instead is a way to synchronously construct a worker from something else than a URL. E.g., passing in an ArrayBuffer object or a string.

sicking commented 8 years ago

I'd be quite happy to see a constructor like:

new Worker({ script: "<js code goes here" });

Right now developers use data-urls to work around something like that.

Regarding the web platform promising forward progress for a worker, I don't have a strong opinion. Right now it does seem pretty nice that we can use a threadpool to share threads between workers, so I'd be a bit sad to completely lose that.

Possibly having a guaranteed forward-progress-worker could be an opt-in thing. Something like

new Worker({ script: "...", dedicatedThread: true }); or new Worker({ uri: "...", dedicatedThread: true });

lars-t-hansen commented 8 years ago

@sicking, I think that's an interesting idea, if dedicatedThread is false then futexWait would throw, as it does now on the main thread.

(We still need sane behavior from the browser to report when a dedicated thread isn't available.)

@annevk, being able to construct a worker directly would be nice but I think it's a hack in regard to this issue (or perhaps it's an orthogonal issue). It seems to be better to spec in HTML that the creating worker may have to return to its event loop before the worker is actually created. (Speaking of hacks, it would be an interesting hack if futexWait were to throw if the about-to-block agent has created workers that are not yet fully created. Would be terribly implementation dependent, maybe timing dependent in the worst case, so something more subtle is called for.)

The underlying issue for shared memory is that workers probably aren't great stand-ins for threads, but I'm disinclined to block the shared memory spec on that problem...

annevk commented 8 years ago

@lars-t-hansen is the problem then that you don't know synchronously whether you have a worker (on its own thread) that can run or not? Because that seems orthogonal to the network loads @khuey mentioned as issue. (Although I guess even then the browser could allocate a thread and then find out the network resource is too big to handle.)

lars-t-hansen commented 8 years ago

@domenic:

There appears to be no way at present to directly determine whether a Worker has terminated.

If I recall this is somewhat intentional as otherwise GC is exposed...

Obligatory hobby horse: JS will remain a second-grade language for serious programming until this particular meme is killed and we can get on with things (finalization, introspection). But I digress.

We can maybe do without termination notification but then at a minimum there needs to be serious language elsewhere to clamp down on when the UA can remove a worker: that needs to be entirely predictable to the creator of that worker, apart from explicit action on part of the worker itself. Perhaps it fits into the discussion around the external suspension of workers, where we can't quite mandate atomicity but we must mandate a kind of common freezing of the state.

(SharedWorker, ServiceWorker add yet more issues here.)

lars-t-hansen commented 8 years ago

@annevk, I think I would be happy-ish with asynchronous error reporting about failed worker creation, so the thread allocation can happen "later", as with the network loads. If that answers your question.

lars-t-hansen commented 8 years ago

@erights, thanks for the links, I will do some reading... On that note, @domenic brought up an issue on IRC that one of the thoughts in the "vat" idea seems to be that the ES execution environments within a vat need not all be on the same computer, as they are communicating (if they are) by messages. For shared memory, in contrast, all the "agents" in an "agent cluster" need to be on the same computer, since communication is through memory within that cluster; that's what defines it. I'm sensing that (not having read the papers you referenced yet) there may be two levels of abstraction here, one for things that communicate through shared memory on a single computer and another for things that communicate in general, and that perhaps a vat is actually a collection of agent clusters.

domenic commented 8 years ago

Hmm, I don't remember bringing that up. I am on the same page that agents/vats/continents are on a single computer and map to what you've given here.

lars-t-hansen commented 8 years ago

@domenic, My fault! Misattribution: It was @annevk who said that in #jslang, this morning:

annevk bterlson: I think Mark has ideas about Vats that require that to be defined, since the Vats are not necessarily on the same computer and such

erights commented 8 years ago

@domenic Before I reply to the discussion of what vats are and are not, note that I completely sympathize with your request for me to stop bikeshedding terminology and to get on to substance. I am trying to find the time to! However, this terminology question has come up and I can respond quickly, so I will.

From section 14.1 of http://www.erights.org/talks/thesis/markm-thesis.pdf

The combination of a stack, a pending delivery queue, and the heap of objects they operate on is called a vat [...]. Each E object lives in exactly one vat and a vat may host many objects. Each vat lives on one machine at a time and a machine may host many vats. The vat is also the minimum unit of persistence, migration, partial failure, resource control, preemptive termination/deallocation, and defense from denial of service.

An individual vat is a singly-threaded event loop that exists on only machine at a time. Different vats may live on different machines. Messages between vats may be intra-machine, intra-browser, intra-address-space, or may be between machines (as in a browser vat talking to a NodeJS vat with asynchronous HTTP requests or whatever).

So is an agent/continent a vat? Certainly, before SAB, yes. At http://wiki.ecmascript.org/doku.php?id=strawman:concurrency#vats I write that "vats are only asynchronously coupled to each other", which SAB violates. But this has always had some exceptions:

The file exception seems to me like the same kind of special case as SAB. Using the file system to synchronous communicate between E vats was certainly possible but was never or rarely done. This violation did not deter us from calling them "vat"s. Likewise, since we hope that SAB is a specialized feature for certain very specialized needs, and is not supposed to be the dominant way that agents/vats/continents communicate, I would be still be inclined to call them vats. OTOH, if we expected SAB to become the dominant paradigm for communicating between these units, then I would argue strongly _against_ calling them vats.

annevk commented 8 years ago

The performance numbers I heard from @lars-t-hansen suggest that SAB may well become a dominant way to communicate. (I still like the "realms are part of a continent" terminology, with worlds or planets meaning actual vats.)

lars-t-hansen commented 8 years ago

It turns out that postMessage is comparatively slow in all browsers. I don't have current apples-to-apples comparisons -- if there's real demand I can try to resurrect and vet the benchmarks that I had -- but generally we should assume that postMessage is currently a couple of orders of magnitude slower than a reasonable mechanism that communicates through shared memory using synchronous delivery (ie blocking, so worker-only). Another problem with postMessage performance is that there are large variations among browsers; when I tested last year postMessage in Safari (with a simple marshaled object) was about 4x faster than in Firefox, IIRC.

I actually think that unless the communication is particularly performance-sensitive postMessage will probably be fine for most uses, and it allows the use of run-to-completion, which is probably a big deal.

Communication through shared memory will be important within performance-sensitive libraries and to avoid large-scale data copying and so on, but I don't actually expect it to take over, especially since the main thread can't block.

lars-t-hansen commented 8 years ago

I now realize that there are two meanings of "dominant": number of communications and volume of data communicated. I sort of expect shared memory to dominate the latter, because the alternatives are usually less desirable (copying is expensive and transfering has poor usability IMO), but not the former, because postMessage is easier to reason about except within libraries.

smaug---- commented 8 years ago

postMessage performance depends highly on what else is happening in the event loop. If there are lots of pending tasks, dispatching message event may need to wait for quite a while. This is why postMessage tends to be particularly slow during page loads for example.

stefanpenner commented 8 years ago

Rather then postMessage shouldn't we be evaluating message channel performance? It was my understanding that is geared more for this (although I suspect rapid message channel chatter, still puts pressure on the event loop, but to what degree in modern browsers I do not know)

annevk commented 8 years ago

@stefanpenner MessageChannel still uses postMessage(). (Though what everyone means is "structured cloning" vs SAB.)

stefanpenner commented 8 years ago

Ah ok

erights commented 8 years ago

@lars-t-hansen

First, postMessage is awful. If we take this platform seriously at all, we should work towards a decent asynchronous messaging API. See

Second, the main reason for choosing a communications abstraction should be the possibility of writing correct programs at affordable effort. Except for specialized cases like games, this should trump performance. As someone once said "If the program doesn't need to work correctly, I can make it much faster. For example, 'halt' fails to meet any specification you'd like extremely fast."

Third, I definitely do not mean dominant by the number of bits transferred. Most of the mass of the earth is rock. It is not what matters most about us.

Fourth, although agent clusters are coupled to each other only asynchronously, an agent cluster is definitely not a vat -- it has internal concurrency. If an agent is not a vat, then I doubt this platform has any vats. I only want the term used if it is used with its original meaning.

erights commented 8 years ago

I apologize.

I am spending today going through the various SAB spec documents and associated material. Once we do "work towards a decent asynchronous messaging API", if the timing should work out, it would probably start out as a polyfill on SAB rather than a polyfill on postMessage. Further, a multitude of data-race-free communications abstractions, both Rust-like and Pony-like, will probably start out built on SAB. Not all of these will be asynchronous. For example, Erlang will probably compile to code that does a blocking receive.

SAB will efficiently support a multitude of race-free communications abstractions. The "possibility of writing correct programs at affordable effort" can no longer be assumed to lead "dominant usage" towards asynchrony and the communicating event-loop concurrency model. Thus, the term "vat" would simply be inappropriate for this and associated documents.

I am very sorry for the distraction, and for underestimating the value of SAB as a foundation for data-race-free communications abstractions.

lars-t-hansen commented 8 years ago

I apologize.

Gosh, no offense taken.

I am spending today going through the various SAB spec documents and associated material. Once we do "work towards a decent asynchronous messaging API", if the timing should work out, it would probably start out as a polyfill on SAB rather than a polyfill on postMessage.

That's great. (Also the rest of what you write.)

Thus, the term "vat" would simply be inappropriate for this and associated documents.

OK, noted.

Onward!

lars-t-hansen commented 8 years ago

In response to @domenic's earlier comment:

It is embedding-dependent whether ECMAScript code can directly access or observe an agent. If an agent has a value representation within an ECMAScript program it is as a host object value.

I would omit these and the similar sentences for agent clusters.

Done.

I think the section on inter-agent communication is a bit out of place. It's mostly talking about unspecified things that other specs could do, and what non-requirements those other specs should impose. Maybe it would make more sense as a series of NOTEs sprinkled throughout other relevant sections?

Done.

The notes on external suspension of agents and how this impacts shared/service workers is very interesting. I might inline this section into the "Agent clusters" section though since it seems mostly to be about the "atomic" nature of agent clusters.

Will address this when rewriting the section on agent clusters.

lars-t-hansen commented 8 years ago

@sicking @annevk @domenic

Possibly having a guaranteed forward-progress-worker could be an opt-in thing. Something like new Worker({ script: "...", dedicatedThread: true }); or new Worker({ uri: "...", dedicatedThread: true });

Anne / Domenic, where could we go with this? It is mostly outside the scope of ECMAScript and the Agents spec, but the idea is sweet. In the Agents spec, we already have the notion that an agent is or is not allowed to block; an agent created with dedicatedThread: false would not be allowed to block, so that's easy. In the HTML mapping we would note that browsers would typically require dedicatedThread: false on the main thread.

Trickier is the issue of the forward-progress guarantee. It's bad not to have that. If a thread that can block does block we want forward progress guarantees on other threads, even those that can't block but that would unblock the blocked thread.

It might seem that it would follow from not having a dedicated thread that there is not a forward-progress guarantee. But that is not necessarily so, since any agent with a shared thread cannot block. I think we could still provide a forward-progress guarantee for agents without a dedicated thread, essentially forcing the browser to start all workers and to run them fairly on whatever execution threads they have.

Anyway, if we want to take this anywhere we must make a choice. Either we fix this (with cross-browser agreement, of course) before SAB ships, so that it is possible for dedicatedThread to default to false... or it will simply have to default to true, which would be kind of a shame.

annevk commented 8 years ago

That would have to be introduced in https://github.com/whatwg/html. We would have to clarify that multiple event loops can share a single thread. Then we'd have to clarify that if you use this worker feature, the event loop allocated for the worker needs its own thread. And potentially some kind of synchronous failure mechanism if browsers are indeed capable of knowing synchronously whether they can allocate a new thread or not.

And I guess we'd need a suitable definition for thread.

lars-t-hansen commented 8 years ago

@domenic

There are some editorial things, mostly around the idea of "attributes", that probably need to be phrased differently in a more ES-ey way. I am not too concerned about that (and don't immediately have good suggestions on how to change them).

Should we just rephrase in terms similar to what ES7 does for Realms, ie as a record-like structure with named fields? All of the attributes apart from the "state" have simple primitive values, and the state could just be a string, if we wanted to make it concrete.

domenic commented 8 years ago

I apologize for not having had time to do a full review of this yet, but yeah, Realm records does seem like a good model to follow. You probably then want to change "surrounding agent" to "surrounding Agent Record" (or just "current Agent Record"?) and use similar language to how "current Realm Record" is defined.

lars-t-hansen commented 8 years ago

@annevk

Agent mapping: I think it would be clearer to state that an Agent maps to the "event loop" concept of HTML.

Done. Removed some chaff around that, too - cleaner now.

lars-t-hansen commented 8 years ago

Realm records does seem like a good model to follow. You probably then want to change "surrounding agent" to "surrounding Agent Record" (or just "current Agent Record"?) and use similar language to how "current Realm Record" is defined.

That last bit I'm not yet sure about. It may be right to do so. Right now we have a "surrounding agent" which is a vague thing, like the execution context. The agent had a set of attributes, but now that they are "fields" of a "record", the temptation you have succumbed to is to put the rest of the agent in that record too - which is to say, adding the running execution context, the execution context stack, and the job queues (this is tables 22-24, maybe more). Once that's in there it's right to make the change you suggest above; until we do, "surrounding agent" remains appropriate.

What's your take on that? (Does it even make sense to you?) We could keep that table reserved for just the attributes, in keeping with the style of the rest of the spec, but leave "agent" comfortably vague. Or we could do something bigger, if the payoff is worth the pain.

domenic commented 8 years ago

Well, execution contexts aren't all that vague: https://tc39.github.io/ecma262/#sec-execution-contexts they're not records, but they have "components". I'm not 100% sure why those aren't records... Maybe @allenwb could clarify? Is it just the vague "code evaluation state" component that makes them un-Recordable? It seems like we have a fairly analogous situation here, where a Record would be convenient for most cases but some of the components are vague enough to give us pause...

annevk commented 8 years ago

So if an agent maps to an "event loop", each worker is an agent, whether it meets the forward progress guarantee or not. Is that problematic? Though perhaps we could solve this when we add "dedicated thread". Dedicated thread would be the feature that guarantees your own event loop, whereas otherwise workers may share an event loop (which is what at least Firefox seems to be doing at times). Is there an issue yet on "dedicated thread" workers?