Clarify that new Document creates a document of type "html", not "xml"

rniwa commented 8 years ago

The current DOM spec says "The Document() constructor, when invoked, must return a new document whose origin is the origin of current global object’s associated Document." and there's an informal note saying "Unlike createDocument(), this constructor does not return an XMLDocument object, but a document (Document object)."

However, document's type is "xml" by default. So I'm confused as to what kind of document we're creating here.

I think what we intend to say here is that we want to create a document whose type is "xml".

rniwa commented 8 years ago

@annevk @cdumez @smaug----

cdumez commented 8 years ago

My understanding is that new Document() creates a Document object and its type will be "xml" indeed.

rniwa commented 8 years ago

My understanding is that we use Document for HTML documents since the DOM spec merged HTMLDocument into Document.

cdumez commented 8 years ago

If by "we" you mean the specification. All major browsers have an HTMLDocument type.

domenic commented 8 years ago

All that follows is about specs; implementations have not quite converged.

Almost all documents are Documents. This includes both XML and HTML documents.

However, there is a method, document.implementation.createDocument(), which returns an XMLDocument, because sometimes people used the load() method of the return value of createDocument() for Ajax-ish purposes.

In https://github.com/whatwg/html/pull/1478 we removed XMLDocument.prototype.load since it was only implemented in Gecko, making XMLDocument an empty interface. This means we could probably kill XMLDocument entirely from the specs; that discussion is #278 and make all documents ever simply Documents.

A further complication: as of 2011 Gecko needs XMLDocument and its load method for web compat on Gecko-only code paths. https://github.com/whatwg/html/issues/1530 tracks adding it back in Gecko compatibility mode, since Gecko has expressed that they prefer that to experimenting with removing it.

ArkadiuszMichalski commented 8 years ago

Earlier someone has already asked about that https://github.com/whatwg/dom/issues/137. Why this constructor can't take additional argument to decide what document (internal xml or html) we want create? Now default is xml so we must use longer document.implementation.createHTMLDocument() which already has a predefined content.

rniwa commented 8 years ago

Well, it's strange for Document to create its subclass XMLDocument based on its argument. Since you could simply do new XMLDocument instead.

annevk commented 8 years ago

We use Document for HTML and "XML" documents. XMLDocument exists mostly because of load() (which only Firefox has at this point I think). E.g., XMLHttpRequest always returns Document from responseXML. This can sometimes be flagged as "xml", sometimes as "html".

foolip commented 8 years ago

Looks like XMLHttpRequest.prototype.responseXML returns an XMLDocument in both Gecko and WebKit even though they support the Document constructor in this test: https://software.hixie.ch/utilities/js/live-dom-viewer/saved/4441

annevk commented 8 years ago

Interesting, does any implementation even support HTML responses for XMLHttpRequest's responseXML? Using that test of yours it seems they don't.

foolip commented 8 years ago

From Blink's source I see that one can get an HTMLDocument, if responseType is "document". https://software.hixie.ch/utilities/js/live-dom-viewer/saved/4443 seems to work in Chrome, Firefox and Safari everywhere, but Edge gives an "Unspecified error".

In the end, are there any APIs other than the Document constructor currently that can return a plain Document, or are they all HTMLDocument or XMLDocument? I suspect that latter.

annevk commented 8 years ago

Not sure, I suspect you are correct.

domenic commented 8 years ago

Since this has cropped up on blink-dev again, and @foolip and I have somewhat divergent opinions, let me outline what I think is the correct path forward in specs and implementations:

Document continues to return a Document (not a HTMLDocument or XMLDocument) whose type is "xml".
Implementations continue to move all members of HTMLDocument into Document. My understanding is that almost everything has been moved to Document in at least one browser, so this should be web-compatible.
We now have a situation where Document contains everything interesting; XMLDocument contains load() in Gecko and is empty everywhere else; and HTMLDocument is empty everywhere. The path forward could go a few ways depending on web compat.
- If nobody wants to try any further simplification, we're done. We resurrect HTMLDocument in the specs as an empty interface, and make sure all the appropriate places return it instead of Document, like implementations do. (But the Document constructor stays unchanged.)
- If people are up for trying a bit more simplification, we alias HTMLDocument to Document like the current specs do, and hope for the best.
- We could even go further and non-Gecko browsers could alias XMLDocument to Document. Gecko could try to see if the web has evolved since 2011 when that was not Gecko-compatible, or it could stay the course. If it's not Gecko-compatible we encode that in the spec as part of Gecko compatibility mode.

cdumez commented 8 years ago

For the record, here is my opinion as well:

new Document() continues to return a Document (not a HTMLDocument or XMLDocument) whose type is "xml".
Update the DOM spec so that XMLDocument becomes an alias to Document (WebKit / Blink used to do this until they introduced the XMLDocument type to align with the spec. However, given that XMLDocument brings nothing on non-Gecko browsers, I'd love to go back to it being an alias).
Bring back HTMLDocument because unlike SVGDocument / XMLDocument, it has a decent amount of API that is only meaningful for HTML documents (e.g. document.write(), document.open()), or legacy API that I don't really want to expose to more document types (e.g. document.all(), document.bgColor).
Add a constructor to HTMLDocument

If a major browser besides Edge actually manages/decides to move everything from HTMLDocument to Document, then I could be convinced otherwise. However, it has been years and it has not happened. I personally do not think the "benefits" of merging HTMLDocument into Document are worth the effort / risks involved.

foolip commented 8 years ago

Update the DOM spec so that XMLDocument becomes an alias to Document (WebKit / Blink used to do this until they introduced the XMLDocument type to align with the spec. However, given that XMLDocument brings nothing on non-Gecko browsers, I'd love to go back to it being an alias).

Oh, how did I miss this? It looks like it was none other than @cdumez who added XMLDocument to Blink and WebKit, and recently too: https://bugs.chromium.org/p/chromium/issues/detail?id=238372 https://bugs.webkit.org/show_bug.cgi?id=153378

Given that, it seems very likely that it can be made an alias of Document again in non-Gecko engines again, but if Gecko can't follow we'll be stuck in a weird place.

If not for the risk for Gecko, everything in https://github.com/whatwg/dom/issues/308#issuecomment-247636495 SGTM, including keeping a few things on HTMLDocument that would always throw on Document.

@bzbarsky, how do you view the chances that making XMLDocument an alias of Document in Gecko would be web compatible today? What was the original issue?

domenic commented 8 years ago

To save @bzbarsky some sighing, the original issue was https://www.w3.org/Bugs/Public/show_bug.cgi?id=14037. See also https://github.com/whatwg/html/pull/1478#issuecomment-231225499. In 2011 there was code that UA-sniffs non-"applewebkit" and then uses XMLDocument.prototype.load in such places.

bzbarsky commented 8 years ago

What was the original issue?

Original issue for what?

We don't so much want to put a load method on all documents, because that has compat risks that don't seem worth having, right? Is the question why we need a load method on XMLDocument? Something else?

I feel like this is the 4th or 5th time I've had this conversation, and each time no one (including me) can find the previous instances because we keep switching bug systems and because Github's setup sucks so much for searching (e.g. the document discussion is scattered across issues in multiple repos, and possibly pull requests too).

bzbarsky commented 8 years ago

Clearly my comment crossed with Domenic's. ;) But case in point: His link to my github comment is to a pull request, not issue, and in the HTML repo, not this one. Searchability, what's that?

cdumez commented 8 years ago

I understand that Gecko needs XMLDocument and XMLDocument.protototype.load. However, now that we dropped XMLDocument.protototype.load from the HTML specification, it seems odd to keep XMLDocument as an interface in the DOM specification.

The situation, for years, was that Firefox had XMLDocument / XMLDocument.prototype.load and WebKit / Blink had XMLDocument as an alias to Document. It is unfortunate that Firefox needs XMLDocument / XMLDocument.prototype.load for backward compatibility. However, other browsers do not have XMLDocument.prototype.load (or intend to have) and they really do not need XMLDocument as a separate type AFAIK. This is why I am arguing for the DOM spec to be changed so that XMLDocument is an alias to Document.

Anyway, I do not have strong feelings. I just feel it would be a cleaner situation for WebKit / Blink.

bzbarsky commented 8 years ago

Sure. @foolip was asking about Gecko making them aliases, though.

foolip commented 8 years ago

Thanks @domenic, I've taken a look at those issues and virginamerica.com from 2011. The problem (from Sarissa 0.9.6.1) was XMLDocument.prototype.onreadystatechange = null and the fix was [LenientThis].

This was in a non-IE codepath, "applewebkit"-sniffing actually wasn't involved here.

Note that Sarissa by itself doesn't require the existence of XMLDocument.prototype.load, it just wraps it as @bzbarsky described. But it's only if some other script calls xmlDoc.load() that it matters, and virginamerica.com didn't AFAICT. It also doesn't seem to matter for Sarissa if XMLDocument is an alias of Document or a separate interface.

The question then remains, does Gecko need XMLDocument.prototype.load (and async) for compat? If it does, then it must be in some Gecko-only code path. Researching this with HTTP Archive would be very hard, any chance for use counters here?

foolip commented 8 years ago

I've tried to summarize everything I could find about document interfaces here: https://gist.github.com/foolip/103963a1ae8598d2baedd296f4a1bf4c

Since the discussion is spread out, I arbitrarily suggest discussing the larger issue in https://github.com/whatwg/dom/issues/221

foolip commented 8 years ago

I think this issue ought to be closed, because the Document constructor as already implemented returns an "xml" document, so leaving that alone seems good. One of two things can then happen:

HTMLDocument is revived and gets its own constructor
Everything is successfully folded in Document, and its constructor is given options to pick between "xml" and "html", with "xml" as the default.

rniwa commented 8 years ago

Everything is successfully folded in Document, and its constructor is given options to pick between "xml" and "html", with "xml" as the default.

I don't think this will happen. It's a compatibility nightmare for what appears to be the most marginal gain on whatever people hoped to get out of it.

foolip commented 8 years ago

I also don't think it will happen or would be a good investment of time, just saying that closing this issue doesn't prevent it.

annevk commented 8 years ago

Given that we seem to reach consensus in #221 let's close this in favor of that.

whatwg / dom

Clarify that new Document creates a document of type "html", not "xml" #308