tc39 / ecma262

Status, process, and documents for ECMA-262
https://tc39.es/ecma262/
Other
15.02k stars 1.28k forks source link

A.p.join on cyclic arrays does not reflect web reality #289

Open rossberg opened 8 years ago

rossberg commented 8 years ago

Browsers have special measures for dealing with cyclic arrays in the A.p.join operator (and by extension, A.p.toString). Chrome, IE, Firefox, & Safari all behave as follows:

a = [1,,2]
a[1] = a
a.toString() === "1,,2"

That is, they detect the cycle and replace it with the empty string. This is not reflected by the spec, which would require toString to diverge.

Should browsers change, or the spec?

allenwb commented 8 years ago

I believe this was discussed during ES5 development and at the time Brendan took the position that this was a browser specific hack that should be in the spec (sorry, Brendan, if my recollection is wrong).

I suspect that at this point in time TC39 would more likely take the position that we need to specify web reality. Either in the main body of the spec or in Annex B.

I good first step would be for somebody to write some spec language that describes what browsers (or at least one browser) actually does.

bterlson commented 8 years ago

It is annoying to spec because it requires plumbing a seen set through ToString :/

bterlson commented 8 years ago

Actually that way will not work as the seen set has to be plumbed through a toString invocation which is user-visible. I'm not sure how to do this.

bterlson commented 8 years ago

Chakra's implementation is to essentially keep a seen list in the current Realm. I suppose that would work here as well.

anba commented 8 years ago

https://gist.github.com/anba/4a154bd7143bf2bab3ef, except the differences between JSC+SM compared to Chakra+V8 when the object is added to the set.

bterlson commented 8 years ago

@anba looks solid at first glance. What is the JSC+SM difference?

anba commented 8 years ago

JavaScriptCore and SpiderMonkey check for cycles right after calling ToObject(this): https://github.com/WebKit/webkit/blob/7920183734b8510146db5aded800cbd67eee1d84/Source/JavaScriptCore/runtime/ArrayPrototype.cpp#L487 https://github.com/mozilla/gecko-dev/blob/137b0dc6d16ce7065a40079d4b8027ec138d4987/js/src/jsarray.cpp#L1099

Chakra checks for cycles after calling ToString(separator) and additionally performs extra proxy checks: https://github.com/Microsoft/ChakraCore/blob/d8aa1dd25637e749dab720e5c5732535cb913d79/lib/Runtime/Library/JavascriptArray.cpp#L4100

V8 checks for cycles after calling ToString(separator) and additionally performs extra steps for single-element arrays: https://github.com/v8/v8/blob/25532be593bd80c9e7b3fac4ee50491159902e93/src/js/array.js#L183 https://github.com/v8/v8/blob/25532be593bd80c9e7b3fac4ee50491159902e93/src/js/array.js#L458

(Similar differences are present for Array.prototype.toLocaleString.)

bterlson commented 8 years ago

@allenwb raised some issues in the gist that I will respond to here :)

Regarding how realms behave, essentially the realm that join belongs to keeps track of any array objects its seen, whether they are from another realm or the same realm. This means that you will actually join the same array twice if you can manage to get that array to be joined by join functions from different realms. This is actually interoperable behavior!

var a = [1,,2];
var $child = $.createRealm();
$child.evalInNewScript('var b = [3,,4]');
var b = $child.global.b;
a[1] = b;
b[1] = a;
print($child.global.Array.prototype.join.apply(a));

spidermonkey

1,3,1,,2,4,2

chakra

1,3,1,,2,4,2

node

1,3,1,,2,4,2

Also, absolutely code can get invoked in the middle, and this is expected. When executing Array.prototype.join, and an element has a custom toString or is a proxy or whatever, and that function causes Array.prototype.join to be applied to an array instance that is already being joined further up the stack, an empty string is returned. This is also expected and interoperable.

var a = [1,,2];
a[1] = {
  toString() {
    return '"' + a.toString() + '"';
  }
}

print(a.toString());

d8

1,"",2

chakra

1,"",2

spidermonkey

1,"",2

node

1,"",2

allenwb commented 8 years ago

@bterison (oops, I actually intended to respond here rather than to the gist)

WRT the xrealm behavior: Did you try it with three references to the x-realm array? I would think that each inner x-realm join call would exit with an empty cycle list in its own realm as it has no way of knowing that it is being called by a toString/join from another realm and hence needs to preserve the list.

WRT the second issue (reentrency) the problem is that the a proxy or whatever might might start a completely unrelated toString/join that is completely unrelated to an toString/join that is pending on the stack. If the unrelated toString/join coincidentally contains a reference to sometime in the global cycle list it is poisoned by the pending operation

I think we could do something more rational in each of these cases and I double that there is user code that depends upon the current behaviors in cases like these.

bterlson commented 8 years ago

Did you try it with three references to the x-realm array?

Can you give me an example? I'm not sure I follow.

the problem is that the a proxy or whatever might might start a completely unrelated toString/join that is completely unrelated to an toString/join that is pending on the stack.

I can buy that this is a problem in the sense that it seems bad, but I'm not sure of a way out of it without introducing observable extra parameters to toString. It is also how implementations behave today. Seems reasonable to spec this as a starting point at least? (Assuming it works for the case you suggest above).

rossberg commented 8 years ago

One set per realm is equivalent to associating each (instance of the) join method with its own seen list, as a sort of private state of the function object. I think that makes sense, as far as sense goes for this kind of thing. Maybe it can even be specced as a special [[property]] the join function creates and maintains on itself?

allenwb commented 8 years ago

It's primarily the reentrancy issue that I'm concern about. I'm pretty sure that this could be fixed at the specification level (either by adding a an optional second argument to join for the cycle set or by specifying join in terms of an abstract operation that recurs with an cycle set argument. (and using IsArray to determine whether to recur on the abstract operator or to just toString).

In either case, this has been unspecified for a very long time. I don't see any need to rush to a problematic solution that ignores the reentrancy issue and just copies a poor implementation decision of the past. I suspect we can do better while still maintaining adequate backwards compatibility.

claudepache commented 8 years ago

@allenwb I have hard time to find a plausible scenario where a reentrant call to A.p.join with a same this argument would not highly risk to provoke an infinite cycle, unless some function involved in the cycle takes an explicit measure to prevent that.

bterlson commented 8 years ago

@allenwb I would normally agree, although this was something we implemented because we found the web depended on it. I would support standardizing what is currently required to run the web and we can discuss improvements (eg. to handle any reentrancy problems). What's the downside with this approach?

ljharb commented 5 years ago

This has been open a long time; I've made an attempt in #1518 to address it. Please review, and I'm more than happy to adjust to all feedback.

cpcallen commented 5 years ago

Three observations. With respect to web reality:

On the topic of reentrancy, from our experience implementing an eccentric ES dialect supporting multiple (cooperatively multitasked) threads, running code from multiple novice but potentially-mutually-hostile programmers:

Finally, rather tangentially:

devsnek commented 5 years ago

decided to make the [[Seen]] set per-thread rather than global

I think this is a given as the spec assumes (or requires?) that values aren't usable between threads

and plan to add a cycle check to [Error.prototype.toString] as well

was this planned to be brought to committee first? web reality is annoying to deal with retroactively, and I'd argue that throwing makes more sense than anything else.

erights commented 5 years ago

Are we sure that this Seen Set cannot be used as a global communications channel?

ljharb commented 5 years ago

@erights in the current spec, “that an array is cyclic” is observable by the stack overflow caused by an infinite loop, or by observing repeated calls to your cyclic object’s toString. In the web, it’s observable by knowing where a cycle is, and confirming that join produces a string with a gap where your cyclic object is expected. I’m not sure what would be newly communicated in either case. Could you elaborate on your concerns?

erights commented 5 years ago

I'm not worried about observing that an array is cyclic. I could easily write an algorithm to do this in user code anyway. I am also fine with the outcome, that toString completes with a string with a gap, rather than crashing.

I also do not know how to produce this sensible outcome without a Seen Set somewhere. This concern is: by its nature, the Seen Set is mutable. We are trying to keep hidden mutable state out of the primordials, so that transitively freezing the primordials results in an transitively immutable primordial state (assuming Date.now and Math.random have already been repaired), so that subgraphs which are otherwise isolated but share transitively primordials still cannot communicate.

I suspect the Seen Set as used here does not enable such communications. But it would be good to verify this one way or the other. Anytime we introduce hidden global (or per-realm) mutable state, we should worry about this.

erights commented 5 years ago

Because this recurs through user code, which might initiate a toString on an unrelated array, a shared Seen Set would be wrong. I am not sure if this is the reentrancy issue @allenwb raises at https://github.com/tc39/ecma262/issues/289#issuecomment-172637042

ljharb commented 5 years ago

@erights in #1518, it's not actually a Set, it's a spec List, that's not directly observable or reachable from user code. It's also not plumbed through toString, so there's no risk of that happening.

erights commented 5 years ago

@ljharb I understand that the collection itself is not reified and made available to user code as an object. But it certainly affects observable behavior of user code. If there's one per realm, then for independent toStrings on joins that are proceeding interleaved (which, yes, would need bizarre coding patterns), both would condition their behavior using the same hidden seen collection. I do not see how to become confident that they do not observably interact with each other.

waldemarhorwat commented 5 years ago

I have some of the same concerns as @erights.

allenwb commented 5 years ago

@erights yes, reentrancy via toString is one if my concerns. But reentrancy could also occur via Proxies. Perhaps a solution would be to abandon cycle detection if the "array" or any element is a Proxy or if a non-instrinsic toString us encountered

erights commented 5 years ago

Hi @allenwb Do proxies raise any concern that a user-defined toString does not raise?

allenwb commented 5 years ago

I don't believe Proxies raise any extra concerns, just another path that need to be considered.

I've thought about this some more and I'm pretty sure it is possible to specify the algorithm that avoids any reentrancy issues.

One way to do it is for the A.p.join algorithm to do an a conditional inlined specialization of the recursive call to join via toString. the specialization occurs for cases where the toString is the A.p.toString intrinsic. This could turn the recursive algorithm (using a global circularity detector data structure) into an iterative algorithm using an internal (to invocations of A.p.join) circularity detector data structure. Reentrancy is not a problem because each explicit call to A.p.join (outside of the toString recursion) starts with a fresh circularity detector data structure.

I haven't worked through the details but it feels like it should work for data structures built out of normal JS Arrays objects. It doesn't necessarily work for data structures that interweave Arrays and non-Array objects with custom toString methods. I'm not sure what the web-reality algorithm does in such cases (or if there is a single web-reality algorithm). I strongly suspect that breaking the web concerns don't actually show-up for these latter data structures.

So, I'm back to where I was in https://github.com/tc39/ecma262/issues/289#issuecomment-172637042 . We can specify this, at least well enough to avoid meaning web breakage, without using a global data structure.

erights commented 5 years ago

Hi Allen, I love the direction, but I'm not sure I understand how such an algorithm would be written. Care to give it a try? Rough pseudo-code would be informative enough. Thanks.

allenwb commented 5 years ago

Roughly

The Array join method takes one argument, separator, and performs the following steps:

Let O be ? ToObject(this value).
Let len be ? ToLength(? Get(O, "length")).  //just for backwards compat if exception
If separator is undefined, let sep be the single-element String ",".
Else, let sep be ? ToString(separator).
Return ArrayJoin(O,  separator, new empty List)

Abstraction operation ArrayJoin(O, Sep, Seen)

If Seen contains O, return "".
Add O to Seen.
Let k be 0.
Let R be the empty String.
Let len be ? ToLength(? Get(O, "length")).
Repeat, while k < len
  If k>0, set R to the string-concatenation of R and sep.
  Let element be ? Get(O, ! ToString(k)).
  If element is in Seen or element is undefined or null, let next be the empty String; 
  else,
     Let m be ?GetV(O, "toString").
     if SameIntrinsicAnyRealm(m, %ArrayProto_toString%) is true,
          let next be ?ArrayJoin(element, Sep, Seen)
      else let next be ? ToString(element)
  Set R to the string-concatenation of R and next.
  Increase k by 1.
Return R.
ljharb commented 5 years ago

Thanks, I'll take a shot at updating #1518 with those spec steps.

cpcallen commented 5 years ago

I'm not sure what the web-reality algorithm does in such cases (or if there is a single web-reality algorithm).

Not that. Consider:

const arr = [0, {toString: () => '(' + arr + ')'}, 2];
String(arr);

which returns '0,(),2' in V8 and jsc at least, but which would result in infinite recursion under the proposed algorithm.

It seems to me that the preconditions to being bitten by this—defining toString methods, and relying on the recursion checks in A.p.join—are all too easily satisfied.

ljharb commented 5 years ago

Wouldn’t that already be the case with the current spec?

allenwb commented 5 years ago

Is @cpcallen's pattern one that has actually been observed on the web? Perhaps, @bterlson knows because of https://github.com/tc39/ecma262/issues/289#issuecomment-172959727 .

Presumably, the goal here is not to eliminate all possible non-terminating behavior starting from A.p.join because that would be the halting problem. EG,

[0, {toString() {while(true);}}, 2].join()

That said, I'm becoming less concerned about an single agent global circularity set shared by all A.p.join intrinsics. Run-to-completion guarantees that once a call to A.p.join starts it will either complete without preemption or not terminate at all. That means that a single initially empty circularity set should be safe. On entry A.p.join checks if the set is empty, if it is that join invocation is the initiator of a new traversal and hence has the responsibility of ensuring that the set is cleared before terminating (probably via the equivalent of a finally). Reentrant calls to join, independent of the traversal such as from Proxy handlers, would still be swept up into the active circularity set, but that sounds like it is already the implemented behavior.

A single global set, rather than a per realm set seems correct because the data structure being traversed can contain circular references among objects from multiple realms but there is only a single job that runs to completion process those x-realm object references.

ljharb commented 5 years ago

@allenwb in the steps above, what is SameIntrinsicAnyRealm?

allenwb commented 5 years ago

A hypothetical abstract operation that in this case tests whether its first argument is the %ArrayProto_toString% intrinsic of any existing realm of the currently executing ECMAScript Agent.

cpcallen commented 4 years ago

I realise I had neglected to respond to @devsnek. Apologies.

decided to make the [[Seen]] set per-thread rather than global

I think this is a given as the spec assumes (or requires?) that values aren't usable between threads

I was unclear. My use of "thread" here was in respect to one of our highly non-standard extensions, and has nothing to do with threads as introduced with Agents in ES2017. Our threads share a single realm; they are only relevant to this discussion insofar as they offer some (dubiously relevant) insight into the difficulty of distinguishing "completely unrelated" toString calls.

and plan to add a cycle check to Error.prototype.toString as well

was this planned to be brought to committee first? web reality is annoying to deal with retroactively, and I'd argue that throwing makes more sense than anything else.

No: our engine is not used in a browser so is unlikely to influence web reality. I mentioned it here to gauge if there was interest in such a proposal, but I note you are the only who has even referenced the idea.