Reviews: First round - Githubissues

@littledan @bterlson @pizlonator @waldemarhorwat @domenic @allenwb @erights

I'd like to get a first round of reviews of this proposal underway so that it can be completed well before the March TC39 meeting. (More rounds later, no doubt.) The reviewers are listed by area below. If you're summoned above you're on my list.

Items not yet ready for review are:

the memory model (section 7.2.1), I need one more week and will send out a notification
the agent model, because it has moved to its own spec, see link at the top of the spec text; agents will need to be reviewed separately, arguably this is a semi-separate proposal for ES2017

Items it may be premature to review are:

Web/HTML integration (section 8), because that's getting some separate attention from HTML people

Note that futexes may be replaced by a related idea called synchronics (again see a link at the top of the spec text), but for now futexes are still in the spec and review would be welcome.

There are two broad areas of feedback I'm looking for:

correctness, notation, naming, desirability, and overall fit with ES6, notably for TypedArray and SharedArrayBuffer, but really in general
open design issues, most are already linked from the spec text, search for "(Issue nn)"

Obviously all feedback is welcome, even if it does not fit those areas. Mark has already been busy.

In general, please pay some attention to the issue tracker, and file comments / concerns as issues there. The spec text will be updated every so often.

Formal reviewers at large:

Dan Ehrenberg, Google
Brian Terlson (or a delegate), Microsoft
Filip Pizlo, Apple

Formal reviwer for the memory model:

Waldemar Horwat, Google

Formal reviewer for the agents subspec:

Domenic Denicola, Google (though he's helping with the design too)

Informal/volunteer reviwers of the overall API and ES integration issues:

Allen Wirfs-Brock
Mark Miller, Google

The spec draft has been updated on 2016-03-03.

Hi everyone,

I’ve read over the spec. I like it a lot! I am happy with the spec even in its current form, so all of these comments can be viewed as optional.

Section 7.1:

The use of integer status codes in Atomics (Atomics.OK/NOTEQUAL/TIMEDOUT) is not canonical. For example, Promises use strings as their status codes. I guess this might be annoying to change for asm.js code, but OTOH, asm.js code could wrap calls to Atomics to convert strings to ints if need be. I don’t strongly feel one way or another, but it might be good to add a note about why this uses ints.

Section 7.2.1:

I’m slightly uncomfortable with this sentence: "A write involved in a data race stores a poison value that can read back as anything whatsoever.” I would prefer if we constrained this to “… can read back as any value previously stored to that location”. Would that be difficult to spec?

Section 7.3:

It would be awesome if Atomics could work directly on the bytes of a SharedArrayBuffer or DataView. Do they?

Section 7.3.7:

Slightly prefer that we remove this. It is a profitable optimization only in certain pathological cases. Also, you can avoid thundering herd without this. Removing it means that there is one less thing that people will want to argue over. We implemented a futex-like abstraction in WebKit (see http://trac.webkit.org/browser/trunk/Source/WTF/wtf/ParkingLot.h). We avoid thundering herd by just strengthening the contract of parkConditionally() to include a beforeSleep hook. I’m not advocating that the spec does this, but it is interesting that the requeue operator is not the only way of avoiding thundering herd; as such I sort of like the idea of designing this once we know more about how people implement this and how it’s used in the wild. Linux locks worked just fine in practice even without requeue. :-)

Section 7.3.8:

I prever if we guarantee that int32 is always lock-free.

-Filip

On Mar 3, 2016, at 11:37 AM, Lars T Hansen notifications@github.com wrote:

The spec draft has been updated on 2016-03-03.

— Reply to this email directly or view it on GitHub https://github.com/tc39/ecmascript_sharedmem/issues/77#issuecomment-191929058.

The use of integer status codes in Atomics (Atomics.OK/NOTEQUAL/TIMEDOUT) is not canonical.

Yeah, this should use string enums like all other JS APIs, and like modern web APIs.

@pizlonator, thanks for the feedback!

Section 7.1: - The use of integer status codes in Atomics (Atomics.OK/NOTEQUAL/TIMEDOUT) is not canonical. ...

It was explicitly made so for asm.js. It is true that asm.js can call out to do futex operations (and indeed the asm.js compiler in Firefox currently requires that anyway, as does the asm.js companion spec), but IMO it was not desirable to force that to be the case, hence the design. Lots of people don't like it the way it is now, I'm not overly fond of it myself, it's a compromise.

All that said: If/when we incorporate synchronics, as I think we should (Issue #12, and I'd love more feedback on that spec too), then futexes will disappear and these status codes will disappear with them.

Section 7.2.1: - I’m slightly uncomfortable with this sentence: "A write involved in a data race stores a poison value that can read back as anything whatsoever.” I would prefer if we constrained this to “… can read back as any value previously stored to that location”. Would that be difficult to spec?

That's a good question, Mark Miller made much the same point. Discussion ongoing in Issue #71.

Section 7.3: - It would be awesome if Atomics could work directly on the bytes of a SharedArrayBuffer or DataView. Do they?

No. There was an explicit goal to be interoperable with TypedArray to support asm.js code generation, and the current design more or less fell out of that, mimicing the ArrayBuffer/TypedArray pair. Atomics don't work on DataView because DataView was considered (by me) way too general to support atomics in any meaningful way, and the use cases were extremely unclear. Do you have some use cases? (Fortunately we can add such ops later if we want.)

(I have noticed that in the case where I'm emulating C-like structures on top of shared memory it would be useful to go directly to the buffer with size-specific accesses rather than managing a set of views, but I'm not yet sure that's an important enough use case to warrant adding more operations.)

Section 7.3.7: - Slightly prefer that we remove this.

(That's futexWaitOrRequeue) Yes, Issue #11 pretty much expresses that concern too. If we remove futexes it will disappear for sure, but I think it should probably be removed regardless.

Section 7.3.8: - I prever if we guarantee that int32 is always lock-free.

Noted. I also think this is the right decision.

Section 7.1: - The use of integer status codes in Atomics (Atomics.OK/NOTEQUAL/TIMEDOUT) is not canonical. ...

It was explicitly made so for asm.js

This is issue #69, actually.

Memory model updated with language from Issue #22 (definition of access range affected by race), Issue #51 (prohibition on rematerialization if it reveals a data race), Issue #71 (constraints on which values that can come out of a data race).

Issue #71 continues to be debated and the language may change again, but it will be a tweak, not a large change.

Issue #48 probably falls out of other semantics, but still thinking about that.

@waldemarhorwat, the memory model is probably complete enough to warrant attention.

Is there a way to see the integrated proposal as an ES6 document with change bars? I'm having a hard time with the mechanics of reading the proposal as only a delta, and it's very difficult to tell if something more should be included but is missing.

ES6 24.1.1.5 and 24.1.1.6: As written, these require all accesses to be sequentially consistent at a byte level, allowing only for word tearing. There's nothing in them that allows for other possibilities, so the memory model is trivial.

6.3.1.1: Why can't byteLength be zero?

7.2.1:

Atomic accesses that are viable are performed in a single total order (the synchronization order) consistent with each agent's program order. An atomic access is not viable if its access range intersects with but is not equal to the access range of another atomic access from another agent that could be adjacent to it in the total order. (None of the accesses that intersect in that way are viable.)

This definition makes it impossible to support more than one size of atomics (say, both 8-bit and 32-bit) when compiling C++ to a big asm.js-style array that simulates the C++ heap. A memory location could be used for a C++ heap-allocated object with one atomic size, freed from the simulated C++ heap, and then reallocated to another unrelated C++ heap-allocated object with a different atomic size. Even though the lifetimes of the two C++ objects don't overlap and wouldn't race in C++, this definition makes the atomic accesses involved non-viable.

What do "the sending of a SharedArrayBuffer, the receipt of a SharedArrayBuffer" mean?

A data access read R only sees a write W that happens before R, where there is no write V such that V happens before R and W happens before V."

This doesn't make sense if there are unsynchronized writes to the same location. It also doesn't make sense if there are two such W's that both satisfy the condition but neither happens before the other.

What does it mean for threads involved in a data race to have synchronized?

Two memory accesses whose access ranges intersect constitute a data race if neither happens before the other, at least one of them is not an atomic access, and at least one of them is a write.

If the execution of an agent cluster has no data races, then the agent cluster behaves according to one of its sequentially consistent executions.

The race affects the union of the access ranges involved in the race, for the duration of the race. The race ends only when all the threads involved in the race have synchronized.

This doesn't seem right. Let's say threads A and B both write to location X, unsynchronized. Then A and B synchronize with each other, ending the race. Then thread C reads location X, with that read being ordered after both A's and B's write. That leads to a contradiction because C will see both values but the data race is over.

The shared memory will always honor a write, but if there is a race the memory can arbitrarily select bits from the operands of the racing writes and the previous memory contents at the racy written-to locations.

But that presumes that racing writes can be put into some total order viewed consistently by all threads, which need not be the case.

Another example:

Location X starts with the value 0.

Thread A does a sequence of four atomic writes to location X, writing 1, 2, 3, 4 to it.

Thread B reads X using a non-atomic read.

Thread C reads X using an atomic read.

None of the threads are synchronized.

The rules imply that C could obtain the value 7 because of the data race between A and B. That would make reasoning about any synchronization between pairs of threads suspect because an unrelated thread could mess it up just by reading.

Overall, while the intent seems ok, the devil is in the details, which are ill-specified.

The values at racy locations are not stable until the race has ended.

I don't know what that means. Didn't the previous point state that a value can only be changed due to a write?

The third rule precludes a so-called rematerialization optimization, where a read from shared memory can be performed multiple times in such a way that several different values are observed for what is a single value in the semantics.

Great! This is necessary to keep the rest of the language safe from quantum values.

The design seems to only produce naturally aligned atomic accesses (good), but I can't tell if there are any loopholes, perhaps by messing with offsets. Is there any way for the user to attempt to make unaligned atomic accesses?

@waldemarhorwat, thanks for the feedback.

Is there a way to see the integrated proposal as an ES6 document with change bars? I'm having a hard time with the mechanics of reading the proposal as only a delta, and it's very difficult to tell if something more should be included but is missing.

Filed as Issue #92.

6.3.1.1: Why can't byteLength be zero?

Duplication of a bug in ES6. The current ES draft has corrected this, will do so here too.

Atomic accesses that are viable are performed in a single total order (the synchronization order) consistent with each agent's program order. An atomic access is not viable if its access range intersects with but is not equal to the access range of another atomic access from another agent that could be adjacent to it in the total order. ...

This definition makes it impossible to support more than one size of atomics (say, both 8-bit and 32-bit) when compiling C++ to a big asm.js-style array that simulates the C++ heap. [...]

I don't think that's true in the absolute sense you state it - non-viability requires possible adjacency in the total order - but it does indeed highlight the known problem that comes with using memory both for atomic and non-atomic accesses, see also Issues #13 and #88. It's a known problem for C's memory model too. I don't have a fix for this at the moment.

What do "the sending of a SharedArrayBuffer, the receipt of a SharedArrayBuffer" mean?

In a browser setting it would mean sending the SAB to a worker via postMessage, typically, and receiving it in that worker.

A data access read R only sees a write W that happens before R, where there is no write V such that V happens before R and W happens before V."

This doesn't make sense if there are unsynchronized writes to the same location. It also doesn't make sense if there are two such W's that both satisfy the condition but neither happens before the other.

I'm not overly fond of my own wording here but I've so far failed to come up with something much better. The intent of "sees" is really "must see", but even that doesn't feel right. Of course if there are racy writes to the location "before" the read the values written will be read, but "before" is not happens-before in that case and the notion of "before" is really ill-defined. The intent of the spec is that the prose on data races should cover the case where the read does not see the write.

What does it mean for threads involved in a data race to have synchronized?

We should probably talk about that - the whole section on the "duration" of a data race could use some discussion - but in the context the threads would each have to have an event s.t. the race happened before that even in each thread. I think it would be enough for one thread to atomically write a shared location (that is not part of the race) and for all the other threads to atomically read it.

The race affects the union of the access ranges involved in the race, for the duration of the race. The race ends only when all the threads involved in the race have synchronized.

This doesn't seem right. Let's say threads A and B both write to location X, unsynchronized. Then A and B synchronize with each other, ending the race. Then thread C reads location X, with that read being ordered after both A's and B's write. That leads to a contradiction because C will see both values but the data race is over.

How is the read in C "ordered after" the synchronization?

The shared memory will always honor a write, but if there is a race the memory can arbitrarily select bits from the operands of the racing writes and the previous memory contents at the racy written-to locations.

But that presumes that racing writes can be put into some total order viewed consistently by all threads, which need not be the case.

Right, that's the discussion that is raging over in Issue #71. At the memory, the racing writes will be put into some total order viewed consistently by all threads, all hardware we're likely to care about is coherent in this sense. I have some lingering questions about how write-to-read forwarding might affect that though. It's good of you to point out that my prose here depends on it, since that opens the door to strengthening the guarantee or to abandoning it, depending on how ambitious we are with the memory model.

Overall, while the intent seems ok, the devil is in the details, which are ill-specified.

OK, will try again :)

The values at racy locations are not stable until the race has ended.

I don't know what that means. Didn't the previous point state that a value can only be changed due to a write?

Yes, but in the absence of any kind of coordinated time it's not easy to say when that write "happens", the earliest moment when we can say something about that is after the threads have synchronized (I think).

What I've been trying to do is to ensure that we can get from a racy state back into a non-racy state, and the synchronization point is when we switch states, but that means that we're in a racy state until then, and there's no reason to assume the values aren't affected by writes arriving late.

(Rematerialization is verboten)

Great! This is necessary to keep the rest of the language safe from quantum values.

Yes. There are some other optimizations that are forbidden too, I don't know if I have a grip on them all yet but I'll be discussing them at the meeting. A JIT that can distinguish shared from unshared memory by type monitoring can pull out all the stops for unshared memory but must be conservative in shared memory, and must assume shared memory if it doesn't know. (Only for TypedArray memory of course.)

The design seems to only produce naturally aligned atomic accesses (good), but I can't tell if there are any loopholes, perhaps by messing with offsets. Is there any way for the user to attempt to make unaligned atomic accesses?

No. I avoided atomics on DataView on purpose (indeed DataView on shared memory was a late addition and is not implemented currently in Firefox, at least) and the rules for how TypedArrays are constructed ensure proper alignment.

Anyway, to sum up, I don't think anyone believes the memory model is done. @jfbastien has made a call for a formalization (issue #88) and there's also the issue of how/if we align the memory model of JS with that of wasm (issue #59). As you and JF have both pointed out, release/acquire is appealing but complicates the memory model. Relaxed even more so. The hardware interpretation of races that @pizlonator calls for in #71 may or may not be a complication. Etc.

And of course I forgot:

ES6 24.1.1.5 and 24.1.1.6: As written, these require all accesses to be sequentially consistent at a byte level, allowing only for word tearing. There's nothing in them that allows for other possibilities, so the memory model is trivial.

Right, that's what the section "With atomic access to shared memory" in the spec is supposed to address, it's perhaps believable as it stands but not phrased in the best way thinkable. It might be better to upgrade GetValueFromBuffer and PutValueInBuffer to tackle the problem head-on.

On Mon, Mar 28, 2016 at 8:52 PM, Lars T Hansen notifications@github.com wrote:

@waldemarhorwat https://github.com/waldemarhorwat, thanks for the feedback.

Is there a way to see the integrated proposal as an ES6 document with change bars? I'm having a hard time with the mechanics of reading the proposal as only a delta, and it's very difficult to tell if something more should be included but is missing.

Filed as Issue #92 https://github.com/tc39/ecmascript_sharedmem/issues/92.

6.3.1.1: Why can't byteLength be zero?

Duplication of a bug in ES6. The current ES draft has corrected this, will do so here too.

Atomic accesses that are viable are performed in a single total order (the synchronization order) consistent with each agent's program order. An atomic access is not viable if its access range intersects with but is not equal to the access range of another atomic access from another agent that could be adjacent to it in the total order. ...

This definition makes it impossible to support more than one size of atomics (say, both 8-bit and 32-bit) when compiling C++ to a big asm.js-style array that simulates the C++ heap. [...]

I don't think that's true in the absolute sense you state it - non-viability requires possible adjacency in the total order - but it does indeed highlight the known problem that comes with using memory both for atomic and non-atomic accesses, see also Issues #13 https://github.com/tc39/ecmascript_sharedmem/issues/13 and #88 https://github.com/tc39/ecmascript_sharedmem/issues/88. It's a known problem for C's memory model too. I don't have a fix for this at the moment.

C++ mostly gets away with it by having notions of object lifetime, although there are some bugs and surprises lurking there.

The race affects the union of the access ranges involved in the race, for

the duration of the race. The race ends only when all the threads involved in the race have synchronized.

This doesn't seem right. Let's say threads A and B both write to location X, unsynchronized. Then A and B synchronize with each other, ending the race. Then thread C reads location X, with that read being ordered after both A's and B's write. That leads to a contradiction because C will see both values but the data race is over.

How is the read in C "ordered after" the synchronization?

It isn't. C is individually ordered after A's write and ordered after B's write. Per the definition of a race, C is not racing.

The shared memory will always honor a write, but if there is a race the memory can arbitrarily select bits from the operands of the racing writes and the previous memory contents at the racy written-to locations.

But that presumes that racing writes can be put into some total order viewed consistently by all threads, which need not be the case.

Right, that's the discussion that is raging over in Issue #71 https://github.com/tc39/ecmascript_sharedmem/issues/71. At the memory, the racing writes will be put into some total order viewed consistently by all threads, all hardware we're likely to care about is coherent in this sense. I have some lingering questions about how write-to-read forwarding might affect that though. It's good of you to point out that my prose here depends on it, since that opens the door to strengthening the guarantee or to abandoning it, depending on how ambitious we are with the memory model.

No, issue #71 is different. What I'm saying is that in practice you don't get a consistent total order as viewed by all threads. Example:

16-bit word at an odd address X starts with the value 0x0000.

Thread A writes 0x0102 to X.

Threads B, C, D, and E read X and obtain values 0x0000, 0x0002, 0x0100, and 0x0102 respectively.

That's a contradiction because there is only one write to X but four values seen there. There is no consistent total order of memory operations in this scenario. This can happen if the reading threads split the 16-bit word read into two byte reads and happen to read the bytes in opposite order from each other.

I think these were the open issues, all from @waldemarhorwat:

ES6 24.1.1.5 and 24.1.1.6: As written, these require all accesses to be sequentially consistent at a byte level, allowing only for word tearing. There's nothing in them that allows for other possibilities, so the memory model is trivial.

(That's GetValueFromBuffer and SetValueInBuffer) These have been rewritten with the new memory model, all shared memory access is now diverted to two new primitives (both from these two functions, and also from a couple of other places in the spec that access shared memory)

Viability ...

The notion of viability has been rewritten as part of the larger rewrite of the memory model. That rewrite also cleaned up (I hope) the notion of which reads "see" which writes, which addresses several of your other concerns.

What I'm saying is that in practice you don't get a consistent total order as viewed by all threads.

I think this has been fixed. The byte-level interleaving that is now legal (when special cases about non-interleaving do not apply) allows the reads to select the bytes they return from all the available data and there are no requirements that all racing readers select the same bytes.

I'm going to close this one and open another one for the second round of reviews.

tc39 / proposal-ecmascript-sharedmem

Reviews: First round #77