whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.06k stars 2.65k forks source link

Fix HTMLAllCollection #775

Closed annevk closed 8 years ago

annevk commented 8 years ago

https://twitter.com/JustRogDigiTec/status/704539722362724352

<3

Tentatively assigning to @domenic since I like him so much.

domenic commented 8 years ago

@DigiTec can you clarify? Trying to piece this together from 140 character tweets is proving a bit difficult :). If I understand correctly:

Blink's IDL has lots of FIXMEs for HTMLAllCollection: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/html/HTMLAllCollection.idl&sq=package:chromium

domenic commented 8 years ago

@So, this is not currently compatible cross browser, with no clear consensus. I am not sure we should change the current spec.

Test case: https://jsbin.com/kaxuqi/edit?html,output

Firefox Chrome Edge Safari
document.all() error ok error ok
document.all.item() error ok ok ok
document.all.namedItem() error error error ok
document.all(0) null HTMLHtmlElement HTMLHtmlElement HTMLHtmlElement
document.all["0"] HTMLHtmlElement HTMLHtmlElement HTMLHtmlElement HTMLHtmlElement
document.all("0") HTMLSpanElement HTMLHtmlElement HTMLHtmlElement HTMLHtmlElement
document.all.item(0) HTMLHtmlElement HTMLHtmlElement HTMLHtmlElement HTMLHtmlElement
document.all.item("0") HTMLSpanElement HTMLHtmlElement HTMLSpanElement HTMLHtmlElement
document.all("foo") HTMLCollection NodeList HTMLCollection NodeList
document.all.item("foo") HTMLCollection NodeList HTMLCollection NodeList
document.all.namedItem("foo") HTMLCollection NodeList HTMLDivElement NodeList
document.all.namedItem("0") HTMLSpanElement HTMLSpanElement HTMLSpanElement HTMLSpanElement
document.all.foo HTMLCollection NodeList HTMLCollection NodeList

We generally have a 2/2 browser split on these questions, with the exception that item() should have its argument become optional. So at this point that is the only change I would make; we would keep HTMLCollection as the return type.

@DigiTec, did this come up as a result of any compat work? E.g. is there evidence that Chrome's behavior is relied on by websites? That would help break the 2/2 split tie.

DigiTec commented 8 years ago

@domenic Just catching up on this. We apparently work offset shifts ;-)

We are working through this issue https://github.com/whatwg/html/issues/210 and we have bandwidth to "get it right" so that is why these questions are coming up. Having a 2/2 browser split is not ideal. Especially if 2 of the larger market shares are split.

all() and all.item() are due to legacaller on the item getter. So the optional on that parameter would be required for Chrome/Edge/Safari to agree (and therefore Mobile and Desktop would have a majority sharehold and FireFox could be quick to fix).

Most of this request is about fixing the IDL itself. The original IDL did not compile for us since it specified two conflicting (in our parser) implementations of item that couldn't be resolved. So we prefer an option that makes this a single line.

legacycaller getter (HTMLCollection or Element)? item((unsigned long or DOMString) nameOrIndex);

The current spec looks like this getter Element? item(unsigned long index); (HTMLCollection or Element)? item(DOMString name); legacycaller getter (HTMLCollection or Element)? namedItem(DOMString name);

Since the only legacycaller is namedItem it means that all can only accept strings item is an overloaded method which we could resolve as written above, but it is only to change the return type from being or'ed to being more specific. The simpler webidl I proposed seems better. We'd remove legacycaller and getter on namedItem because it is already covered by item.

Compat usually breaks when you lock things down, not when you open them up. For Edge we'd have to

If we choose not to change the spec in this way to match the Chrome/Safari behavior more closely then we would like, if possible, to get a committed bug to correct the behavior for Chrome/Safari to avoid future interop issues.

Did this result from compat work? No, it resulted from Interop work found while tearing HTMLAllCollection from HTMLCollection. Note, this may result in some HTMLCollection feedback as well.

nox commented 8 years ago
legacycaller getter (HTMLCollection or Element)? item((unsigned long or DOMString) nameOrIndex);

I'm pretty sure this is not spec-compliant and that both Gecko and Servo choke on it.

What about an anonymous legacycaller?

getter Element? item(unsigned long index);
(HTMLCollection or Element)? item(DOMString name);
getter (HTMLCollection or Element)? namedItem(DOMString name);
legacycaller (HTMLCollection or Element)? ((unsigned long or DOMString) nameOrIndex);
DigiTec commented 8 years ago

@nox Do you mean not spec compliant per WebIDL standards? It did pass our parser and even went all the way through TypeScript d.ts creation through our XML format and nothing flagged along the way.

What restrictions would disallow the one liner from working? I couldn't find anything during a spec read.

nox commented 8 years ago

http://heycam.github.io/webidl/#idl-indexed-properties

Indexed property getters must be declared to take a single unsigned long argument. Indexed property setters must be declared to take two arguments, where the first is an unsigned long.

getter type identifier(unsigned long identifier);
setter type identifier(unsigned long identifier, type identifier);

http://heycam.github.io/webidl/#idl-named-properties

Named property getters and deleters must be declared to take a single DOMString argument. Named property setters must be declared to take two arguments, where the first is a DOMString.

getter type identifier(DOMString identifier);
setter type identifier(DOMString identifier, type identifier);
deleter type identifier(DOMString identifier);

getter type (DOMString identifier);
setter type (DOMString identifier, type identifier);
deleter type (DOMString identifier);
domenic commented 8 years ago

OK, cool. So it sounds like you're not too concerned about NodeList vs. HTMLCollection? Then we can focus on the other issues.

I've updated https://jsbin.com/kaxuqi/edit?html,output to include document.all(0) and document.all.item(0) so we can figure out whether browsers allow indices (would return HTMLHtmlElement) or convert to strings (would return null since there's no element with id "0"). The table above is updated too. It looks like there is a majority that does allow indices (Firefox being the exception). So we should fix that.

I agree we also want to allow document.all.item() (i.e. no arguments) since that's a majority opinion.

To achieve these fixes I think @nox's solution is going to be the best one. It seems pretty clean to me.

nox commented 8 years ago

So:

getter Element? item(unsigned long index);
(HTMLCollection or Element)? item(optional DOMString name);
getter (HTMLCollection or Element)? namedItem(DOMString name);
legacycaller (HTMLCollection or Element)? ((unsigned long or DOMString) nameOrIndex);

With an optional argument in the non-getter item operation.

domenic commented 8 years ago

@DigiTec how does that sound to you? Happy to make the change today if you agree.

DigiTec commented 8 years ago

The getter/setter restrictions seem odd to me from the spec, but I'll pass the rest through our parser later today and if I run into any issues will let you know. I'm guessing the rationale is for code auto-gen and overload resolution to work properly the restrictions need to be in place.

@domenic NodeList "seems" wrong since you are disallowing authors from poly-filling APIs properly and the spec clearly states HTMLCollection and has for a very long time. Would that be an area Chrome would be willing to take a bug? The HTMLAllCollection WPT tests will need to encapsulate this behavior and I think the right test for return type should be HTMLCollection and thus Chrome would fail. Is that okay?

NodeList doesn't have namedItem and a bunch of other stuff. There seems to be a lot of bugs here though across browsers and maybe even within the spec.

per HTMLCollection semantics this should work and it does work in Edge/FireFox console.log(document.all.item("foo").namedItem("foo"))

domenic commented 8 years ago

Yeah, Chrome should be happy to take a bug and fail the relevant WPTs. Maybe @foolip can confirm since I believe he was the one who added the FIXME comments to the IDL.

DigiTec commented 8 years ago
// Allows item(0) and [0] to return element or null`
getter Element? item(unsigned long index);
// Allows item("foo") to work and return collection, element or null
(HTMLCollection or Element)? item(optional DOMString name);
// Allows namedItem(str) and [str] to return collection, elment or null
getter (HTMLCollection or Element)? namedItem(DOMString name);
// Had to add optional here for all() to work
legacycaller (HTMLCollection or Element)? (/*added*/optional (unsigned long or DOMString) nameOrIndex);

I think we need anonymous getter's and not item mapping. Why? Because [num] (getter syntax) returns undefined while item(num) returns null. So these can't be overloads of one another.

console.log("non-existent", document.all(5000), document.all.item(5000), document.all[5000])

Probably same for namedItem since that is even more weird (it doesn't ever return null in Chrome, always undefined).

Test cases here (2 more added to the top list) https://jsfiddle.net/cjjhj2ms/

QQ: Is it possible that we can't fully describe the right behavior in the WebIDL? That the undefined vs null behavior cannot be captured in the current specification of the language?

nox commented 8 years ago

Please use backticks in your message because it is quite hard to read as such with Markdown not helping.

nox commented 8 years ago

I think we need anonymous getter's and not item mapping. Why? Because [num] (getter syntax) returns undefined while item(num) returns null. So these can't be overloads of one another.

Not a problem. The undefined is returned by the fact that the passed num was outside the set of supported property indices:

If an indexed property getter was specified using an operation with an identifier, then the value returned when indexing the object with a given supported property index is the value that would be returned by invoking the operation, passing the index as its only argument.

See how the operation isn't used at all if the index wasn't supported to begin with.

The platform already relies on this in CSSStyleDeclaration: the operation associated with the getter returns an empty string, while using the getter syntax it returns undefined.

var div = document.createElement("div");
console.log(div.style["foo"]);
console.log(div.style.item("foo"));
DigiTec commented 8 years ago

Adding @travisleithead as well.

If the odd behaviors of undefined are necessary then we likely need to define all parts of this API separately (no collapsing).

Two anonymous legacycaller's one for the numeric and one for string Two anonymous getters one for numeric and one for string item and namedItem as their own thing

I'm not even sure if that is entirely possible.

We also believe that to return undefined for some set of scenarios we need to or in void to some of the return value lists.

(void or HTMLCollection or Element) // Used when we need undefined
(HTMLCollection or Element)? // Used when we need null

I also just saw @nox respond with some new information. Let me drop this comment as is and then reflect on his updates. I think they do make some thing simpler, but still the fact that Chrome has dramatically different null vs undefined behavior in similar conditions for string vs num would likely still need something calling it out for THAT condition. That might at least mean that we need different syntax for DOMString vs unsigned long indexing.

nox commented 8 years ago

Returning undefined is mandated by the specification of getters, while returning null is mandated by the specification of the associated operation, nothing WebIDL cannot currently handled, so I don't think they should be split.

There is nothing forbidding either Chrome or Edge to write their IDL definitions differently as long as the observable effects are the same though.

DigiTec commented 8 years ago

The problem is this test:

console.log("non-existent num", document.all(5000), document.all.item(5000), document.all[5000])
console.log("non-existent str", document.all("whammy"), document.all.item("whammy"), document.all["whammy"])

This returns

null, null, undefined
undefined, undefined, undefined

Which means when using DOMString on item that the value is not (HTMLCollection or Element)? it is instead (void or HTMLCollection or Element). So this is separate from getter syntax altogether. So while the last value in the list can be defined by the getter prose, it doesn't handle the return type for the non getter case.

Also, the legacycaller acts like the method, while the getter acts differently. I'm not sure if the spec clarifies that. That legacycaller is actually more like the method and less like the getter. I know that in EdgeHTML we map legacycaller -> GetOwnProperty (edit: Which I should clarify is the getter). So this can lead to pretty big interop discrepancies.

nox commented 8 years ago

So your first line (null, null, undefined) is just the usual business I described.

As for the second line… Now that's a problem. We can't put void in an union so we can't do what you suggested. :(

DigiTec commented 8 years ago

Ugh, that was the only thing @travisleithead and I could come up with :-(

nox commented 8 years ago
getter Element? item(unsigned long index);
any item(optional DOMString name);
getter (HTMLCollection or Element)? namedItem(DOMString name);
legacycaller any ((unsigned long or DOMString) nameOrIndex);
nox commented 8 years ago

I think this cover all the cases, but we have to use any.

domenic commented 8 years ago

It's null, null, undefined in Firefox if that helps.

nox commented 8 years ago

Oh! I didn't check. That combination is just OK too with the current WebIDL rules and my previous snippet without any. What does Edge do? Oh you just said it just returns undefined all three times.

DigiTec commented 8 years ago

Right now edge does

undefined, null, undefined // per my notes about legacycaller mapping to GetOwnProperty 
undefined, null, undefined // for the same reason

We auto-gen caller entry points and so GetOwnProperty was the only thing we could easily map to and it "works" in the majority of cases. Also legacycaller is now pretty much only on HTMLAllCollection. We dropped support for it in all other cases I believe with the following caveats:

I think other browsers also support it for HTMLObjectElement and HTMLEmbedElement for legacy reasons that we no longer have.

Willing to custom generate the CallerEntryPoint instead of auto-generation if it really is true that legacycaller is ONLY on the HTMLAllCollection now.

nox commented 8 years ago

I see that document.all() returns undefined too, so I think using any and a separate legacycaller is the way to go:

getter Element? item(unsigned long index);
(HTMLCollection or Element)? item(optional DOMString name);
getter (HTMLCollection or Element)? namedItem(DOMString name);
legacycaller any (optional (unsigned long or DOMString) nameOrIndex);
DigiTec commented 8 years ago

We can mark it any, but then we need to clarify the expected values in the specification. I also have to come up with a TypeScript solution that constrains the types to only those expected to be returned. any is too tolerant and doesn't allow for tool chains to provide any intellisense. As noted earlier though, this is now ONLY HTMLAllCollection.

With this change would we take the stance that all 3 browsers are compliant even though they have different return values? Or would we lock down the WPT tests to one of the behaviors? There are some clear behaviors we CAN lock down (such as Edge not returning an HTMLCollection in the namedItem case when it should be, which I have fixed locally). Tests for null vs undefined is really not a huge deal I guess since basic truthy/falsey still works (though strict comparisons would fail) and we could choose to accept both values.

Of all the proposals I'm liking this one the best so far. Its close enough and we can wrangle the rest. Here is what the Edge webIDL would look like (we don't yet have overload resolution) and I had to add support for anonymous legacycaller since this is the first time we've had this.

any item(optional DOMString name);
[msOverload] getter Element? item(unsigned long index);
getter (HTMLCollection or Element)? namedItem(DOMString name);
legacycaller any (optional (unsigned long or DOMString) nameOrIndex);
domenic commented 8 years ago

With this change would we take the stance that all 3 browsers are compliant even though they have different return values? Or would we lock down the WPT tests to one of the behaviors?

No, the spec would require an unambiguous answer defined in prose. Return types in Web IDL are pretty useless anyway; they don't impose any normative requirements that the prose does not already. (That is, their only function from a spec perspective is generating "spec segfaults" when the prose tells you to return something other than the Web IDL return type.)

What about just going with

getter Element? item(unsigned long index);
(HTMLCollection or Element)? item(optional DOMString name);
getter (HTMLCollection or Element)? namedItem(DOMString name);
legacycaller (HTMLCollection or Element)? ((unsigned long or DOMString) nameOrIndex);

and saying that nobody is allowed to return undefined (except via the getters)? That might be against the majority (haven't run full tests), but it seems most sane, and the only edge-case difference is null vs. undefined.

nox commented 8 years ago

@domenic To be fair I would rather us do that too.

DigiTec commented 8 years ago

Edge should be able to align to that, we simply change our mappings from calling the getter (GetOwnProperty) to the actual method through a custom thunk. From that point on legacycaller, though specified anonymously would simply behave as the union of both item methods since overloads happen within the method anyway. One silly question. Could this be simplified further to:

legacycaller (HTMLCollection or Element)? item(optional (unsigned long or DOMString) nameOrIndex);
getter Element? (unsigned long index);
getter (HTMLCollection or Element)? namedItem(DOMString name);
nox commented 8 years ago

I think that would work yes. Many WebIDL features in the same interface member though. :)

DigiTec commented 8 years ago

Yep ;-) Note I verified that produces the right code gen for me even without the anonymous member fixup. So I kind of like it. It also produces pretty darn good TypeScript definitions automatically without fixup.

travisleithead commented 8 years ago

This resolves a long-standing issue in our WebIDL bindings. Thanks @DigiTec, @nox, and @domenic for working through this.

foolip commented 8 years ago

Yep, I added those FIXMEs to Blink's HTMLAllCollection.idl, but I didn't spend any time investigating further. Thanks for filing this issue! I'll take a look at the PR.

foolip commented 8 years ago

While poking at this in Blink, I found that there's yet another legacycall form hidden in there, namely document.all("foo", n). This gets the _n_th element with name "foo". A test: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3932

It looks like this oddity is only supported by Blink and WebKit. It seems like a strange thing to add just for the fun of it, but I haven't dug into its history.

foolip commented 8 years ago

In trying out #780 in Blink, I found another curious case: document.all("0"). It turns out that Blink, Edge and WebKit all treat this as they would document.all["0"], i.e. the string gets interpreted as a number. In Blink this was implemented in custom bindings, and following the spec would cause the string "0" to be handled by the 'getting the "all"-named element(s)' algorithm instead.

While in the area, I also notice that document.all.namedItems("0") isn't quite interoperable, in Gecko that doesn't return an element with id=0, but all other engines do: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3935

domenic commented 8 years ago

OK, it seems like there is still work to do here. Updated https://github.com/whatwg/html/issues/775#issuecomment-190796607 with some results. Action items:

While in the area, I also notice that document.all.namedItems("0") isn't quite interoperable, in Gecko that doesn't return an element with id=0, but all other engines do:

In my Gecko your test case outputs the body element correctly.

DigiTec commented 8 years ago

It looks like you guys have found 2 things.

First, the string to numeric. Since you can't tell the difference (ToInteger and ToString are both too tolerant in JavaScript) you have to convert to string in some cases, then parse as integer and take the integer. All browsers seemed to do this properly. I tested a lot of variations of this and everyone was mostly complicit so I didn't even bother to bring it up. That was my mistake. I should have brought it up. There is nothing in the webIDL we can do to describe this behavior.

For (index, subIndex) I think Edge does support something like this, but only for .item(), for the legacycaller, since we forward to GetOwnProperty we lose the index. We were planning on removing support for this odd overload. We didn't even realize that other browsers had any support for it.

Note we support this behavior on HTMLCollection as well, which again is really, really wrong and we are hoping to remove this behavior there as well when we do our upgrades.

For namedItems("0"), AFAICT, no browsers ever try to do a numeric conversion and we think that is correct. So item("0") and all("0") are the oddities and those are covered by my first point at the top of the comment. That all of us seem to take the string, see if it parses as a numeric and then change it into an index.

domenic commented 8 years ago

I'm doing some testing and it looks like strings with numeric values greater than 2^32 - 1 are always set to 0 in Chrome, and in Edge they return undefined even if there is an element with an ID equal to that value. I'll spec the Chrome behavior... this is so edge-casey....

domenic commented 8 years ago

No, wait, that wasn't correct, my tests were just a bit wrong.

It seems like the "convert string to number" algorithm is unlike most in JavaScript. For example "0x0" gets the element with id "0x0", instead of the element with index 0.

domenic commented 8 years ago

I guess we want #rules-for-parsing-non-negative-integers

domenic commented 8 years ago

Pushed https://github.com/whatwg/html/commit/d3a76b366c99538562a207d64b468ed31690613e to handle the numeric indices. Note that the behavior doesn't match Chrome exactly; e.g. Chrome returns undefined for "+0", instead of giving the 0th element on the page like Edge does. But it seemed better to do this than to invent a new number-parsing algorithm.

zcorpan commented 8 years ago

Note that "0x0" parses to 0 by #rules-for-parsing-non-negative-integers

domenic commented 8 years ago

That doesn't seem true? Only ASCII digits are collected. Could you walk me through it?

DigiTec commented 8 years ago

Data points for Edge. We use two parsing algorithms depending on the code path.

For GetOwnProperty the "value" or "name" is always a string name and so we always do numeric parsing. We generally use some equivalent of atol which allows, space, pos/neg sign, digits. On overflow we clamp based on the sign. big values get clamped to LONG_MAX and small values to LONG_MIN.

Strangely, for item() our code still uses a legacy marshaling behavior and thus we rely on the JavaScript engine to give us "known" numeric types first as a number (no parsing), but generally we fall back to string and then try to convert the string to a number using legacy COM string to number conversion.

We should certainly complete the spec here for the edge cases, but I'm doubtful I would change those behaviors on our side at this time. I'd simply rely on them being friendly enough for normal cases.

zcorpan commented 8 years ago

Sure.

https://html.spec.whatwg.org/multipage/infrastructure.html#rules-for-parsing-non-negative-integers -> https://html.spec.whatwg.org/multipage/infrastructure.html#rules-for-parsing-integers -> Nothing interesting until step 8:

Collect a sequence of characters that are ASCII digits, ...

https://html.spec.whatwg.org/multipage/infrastructure.html#collect-a-sequence-of-characters

While position doesn't point past the end of input and the character at position is one of the characters, append that character to the end of result and advance position to the next character in input.

This collects the first 0 and doesn't collect the x, and returns the 0. No error so far.

Now back to https://html.spec.whatwg.org/multipage/infrastructure.html#rules-for-parsing-integers

and interpret the resulting sequence as a base-ten integer. Let value be that integer.

Interpret 0 as a base-ten integer... and that's what's returned; not an error.

DigiTec commented 8 years ago

Does any browser really ignore the trailing characters? Isn't that the implication of that wording?

0x0 parses to 0 because x0 is ignored in the sequence? That means 10foobarbaz parses as 10. Or 20px parses as 20. I must be understanding this incorrectly. Edge will definitely fail parsing and return an error and then depending on the algorithm the error will get eaten or thrown. For this case, it would get eaten since item("10foobarbaz") would fall back to the DOMString version, which is what you would expect.

zcorpan commented 8 years ago

You understand correctly. This is how e.g. <table width="10foobarbaz"> works. But it's possible document.all doesn't/shouldn't use these rules, or use a "validation" step first (like "if X is a valid non-negative integer...")?

zcorpan commented 8 years ago

A case that might be interesting to test also is " 0" (i.e., with a leading space); both HTML's and JS's integer parsers skip leading spaces. Is it a name or an index?

zcorpan commented 8 years ago

Sorry, table width uses https://html.spec.whatwg.org/multipage/infrastructure.html#rules-for-parsing-dimension-values but I think the same thing applies there.

https://html.spec.whatwg.org/#the-select-element:rules-for-parsing-non-negative-integers is an example that uses rules for parsing non-negative integers.

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3937

domenic commented 8 years ago

Ah thanks. I thought you were talking about hex in general, not about 0 in particular. That makes sense.

A validation step is exactly what I need. "valid non-negative integer" will disallow leading + and spaces, and won't allow trailing garbage. I originally thought that parsing + rejecting an error would be the same as validating + parsing, but that is clearly not true upon re-reading. Revising to use "valid non-negative integer" for now.