Closed nlwillia closed 8 years ago
Here's a relevant thread from way back: https://lists.w3.org/Archives/Public/public-whatwg-archive/2010Dec/0308.html. Specifically, that link is to a comment from @Hixie saying that legacy code such as <p/>Here is a paragraph</p>
exists in the wild that, sadly, makes this a web-breaking change.
It could possibly be done with some kind of directive/opt-in. :speak_no_evil:
That's a good point. Custom elements should have less of a legacy problem (or at least merit their own discussion in 2016), but there are cases like <div custom-angular-directive="" />
too which would benefit from a consistent behavior. It'd be nice if standards mode alone were sufficient to weed out an acceptable percentage of legacy pages, but any support, even if opt-in would be helpful.
Fortunately for you, the spec already includes such an opt-in! If your server sends the header Content-Type: application/xhtml+xml
, you can use the <self-closing-tag/>
syntax with any element.
Yes...but that would (unless I'm missing something) require the whole document to be XML which is a steep prerequisite.
XML Parsing Error: syntax error
Location: http://127.0.0.1:8080/test.xhtml Line Number 1, Column 1:
<!doctype html>
^
Case-insensitivity, tolerant parsing, simple doctype and other benefits are what web authors expect from modern HTML. Web components mean we're going to be spending more time using tags the spec never imagined that are semantically void but can't currently benefit from the same abbreviated syntax.
I don't know if @Hixie looked specifically for the combination of code where the start tag contains a hyphen and a trailing slash. Perhaps that is unique enough that we can introduce it as a thing, for custom elements.
I think it would be more consistent to require special sigils in the name to signify what kind of element it is, as per https://github.com/w3c/webcomponents/issues/113. So for example if an element was named with a dash at the end (<my-void->
) it would be void automatically (no need for />
). Maybe it would work to say that the element's name is my-void/
so you could type <my-void/>
??
But in general the point I'm trying to make it is that I think it makes more sense to use special sigils in the name to opt in to special parsing behavior, instead of trying to come up with suffixes that only work on custom elements. The end result is somewhat similar, but I think having the processing model be based on name-lookup (<br>
and <hr>
and ... and <my-void->
are void) instead of ad-hoc rules ("is <br>
or <hr>
or ..., or has a dash and has a /
before the closing >
) is nicer.
Maybe it would work to say that the element's name is
my-void/
so you could type<my-void/>
?
This wouldn't work, because the element might have other attributes.
e.x. <my-void/ id="void3">
You could write it <my-void id="void3" />
, but that brings us back to the start.
Note that SVG/MathML in HTML already support />
. Also HTML parsers already need to take note of the trailing slash if they report parse errors. I'm not sure I agree with @domenic's reasoning. It seems like a new rule people need to learn. I think />
is something many people just expect to work, and HTML is surprising that it (mostly) doesn't.
We can't make it work in general because web compat, but I don't see anything particularly wrong with making it work for tags with dashes. Any new native element won't have dashes so this rule seems like it's reasonably consistent and easy to reason about (SVG/MathML/custom elements support self-closing, HTML elements not).
Also self-closing is a lot more flexible compared to void. If you start using an element as void and later want to let it have content, with void you would have to change the element's name, and any JS/CSS/etc would also need to change, or be written to support both names from the start, just in case. With self-closing supported you can switch at will without problems.
Another way of stating my position is that I think the model we have so far in HTML is void (not self-closing), and void-ness is a property of the element, not of how it's used (<hr>
is always void; <video>
is always not-void).
Strict void-ness is nice for the parser since it knows whether to expect an end tag, but custom tags will be much more diverse than the current void subset and could have reason to be either/or. (XML leaves it to the domain and doesn't even distinguish between <tag><tag>
and <tag />
.) Encumbering the format of tag names for the sake of void-ness may not be a good trade-off.
@domenic but as pointed out by @zcorpan that rule is already violated for SVG and MathML. Why not violate it for custom elements?
There's tons of custom elements on the web (we used to call them "invalid elements that happen to have hyphens in their names"). I would be shocked if changing their parsing model even slightly, let alone this much, was backwards compatible. But if we collect data on this we can find out.
Going off of domenic's statement that voidness should be a property of an element, and Hixie's note that changing how custom elements are parsed could have unforseen negative results, why not extend the Document.registerElement()
method so that voidness can be specified as a property? Elements would be non-void by default so as to not break existing code, and html parsing wouldn't have to be modified either. This should add the functionality we want while being entirely backwards compatible.
@Yay295 we do not want script execution to change the way a document is parsed. That would make it much harder for simple tools to extract data out of HTML.
Reading through the custom elements spec draft, it sounds like the intent is for unresolved elements to have a basic level of support in the document whether their definition (whatever format that takes) is loaded yet or not. That seems like it would make configuring void-ness at that level challenging since the parser might start with default HTML syntax but later discover a rule that made the element void and have to shift the tree.
A document (or template) level <meta>
option (which is what came to mind from @andyearnshaw's comment) would be a bit of a blunt instrument, but if you think about this as a code style decision rather than an element-by-element property then authors could make the decision themselves when setting up a unit of content.
Re @Hixie
I researched httparchive:har.chrome_jan_15_2016_requests
dataset of 470k pages, looking only at top-level pages matching REGEXP_EXTRACT(content, r'<([a-zA-Z][a-zA-Z0-9]*-[a-zA-Z0-9]*)(?:\/+>|\s[^>]*\/>)')
. 47 pages match. Of those, 0 would regress and 1 would have improved rendering by switching to supporting self-closing. I conclude that it would be reasonably safe to change. Analysis below. (I put the improved one first.)
http://www.sunil-android.blogspot.in/ uses-permission
<pre class='brush:js;'>
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
</pre>
This then uses JS to reserialize the DOM and syntax highlight it. But it currently renders the "wrong" thing:
1 <uses-permission android:name="android.permission.INTERNET">
2 <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE">
3
4 </uses-permission></uses-permission>
Supporting self-closed here would improve this page.
http://www.uregina.ca/ below-leftnav
<div class="" id="blockBelowLeftNav"><below-leftnav /></div>
In a div
.
http://www.sebts.edu/ system-region
<div class="content" id="blogs">
<system-region name="blog-feed" />
</div>
<div class="content" id="podcasts">
<system-region name="podcasts-feed" />
</div>
In a comment.
http://www.bancadelladriatico.it/ content-type
Page has changed?
http://www.miamioh.edu/ system-region
<!-- removed, this code now placed in combo news and events, this will be removed <system-region name="EVENTS"/> -->
In a comment.
http://www.brandeis.edu/ system-region
<!-- <nav aria-label="main" id="mainNav"><system-region name="mainNav"/></nav> -->
In a nav, in a comment.
http://www.about.com/ globe-environment
<!--
<globe-environment environment="prod" application="globe" dataCenter="ny" serverName="nyglobe5" />
<globe-server version="1.64.7" vendor="" title="Globe Server" />
<globe-resources version="1.64.7" loadStartTime="1456173386709" loadTimeTaken="3764" />
-->
In a comment.
http://www.bostonapartments.com/ id-publisher
<id-publisher publisher-typ="individual" email="webmaster@bostonapartments.com" street-address="40 Brock Street" location="US/MA/Boston/02135-2502" />
<id-info description="Boston Apartments Online Rental and Sales Magazine" location="US/Massachusetts/Boston" street-address="" language="EN" keywords="Boston Apartments Rentals Real Estate apartments in boston, boston apartment, boston apartment search, boston rentals, rentals in boston, boston realty, Massachusetts apartments, Residential Massachusetts Broker Realtor Agent Apt Room Tenant Landlord Classifieds Roommates Furnish Property Management agency Sales Houses Condominiums Condos Agencies Short term temporary housing house" subject="business/consumer, leisure/travel, business/real-estate" />
<id-system crawl=YES />
In head
, so these currently wrap the entire page. Removing these tags makes the page still render the same.
http://www.rayovallecano.es/ dynamic-element
<root> <dynamic-element name="codigo" type="text_area" index-type="" repeatable="false"/> </root>
In another element.
http://www.menu.com.do/ cl-transformation
<cl-image public-id="{{ business.public_image_id ? business.public_image_id : 'logo-na' }}" format="jpg">
<cl-transformation transformation="new_businesses_teaser_logo" />
</cl-image>
In another custom element.
http://www.jmu.edu/ xhtml-block
<xhtml-block/>
Has content after, but adding an end tag makes the page render the same.
http://www.yoinfluyo.com/ m-eta
<m-eta name="alexaVerifyID" content="VkmI1LxIi78sTkL4NXfXKWtPKI4"/>
Probably deliberate "comment out". In head. Changing to meta
still renders the same.
<noscript><a
target=_top href="http://top.mail.ru/jump?from=819005"><FONT size="1">[AD]</FONT><AD- ounter?js=na;id=819005;t=56"
border=0 height=31 width=88
alt="Рейтинг@Mail.ru"/></a></noscript>
In a
in noscript
.
http://www.bikinibutt.com/ site-control
Page changed?
http://www.caript.it/ content-type
<content-type name="slideshow" namespace="collector"/>
In a comment.
http://www.fastcult.ru/ lj-embed
<div class="j-w-article-text"><lj-embed id="3141" /><br /><img src="http://ic.pics.livejournal.com/fastcult/50338851/5818844/5818844_original.png" alt="" title="">
Has an image after it. Closing the tag doesn't change the page rendering.
http://www.peteava.ro/ site-control
<!--<cross-domain-policy>
<site-control permitted-cross-domain-policies="all"/>
<allow-access-from domain="*" />
<allow-http-request-headers-from domain="*" headers="*"/>
</cross-domain-policy>-->
In a comment.
http://www.fiu.edu/ system-region
<!--
<hr class="whiteArrows"/>
<system-region name="Life at FIU" />
-->
In a comment.
http://www.mailendo.com/ moo-modal
Page changed?
http://www.uwinnipeg.ca/ system-region
<!-- <div class="box" id="indigenousInclusion"><system-region name="CUSTOM BUTTON 1"/></div>
<div class="box" id="wesmenUpdates"><system-region name="CUSTOM BUTTON 2"/></div>-->
In a div
in a comment.
http://www.haarshop.nl/ angucomplete-alt
<div>
<i class="fa fa-search"></i>
<angucomplete-alt id="search-mobile"
...
input-class="form-control form-control-small"/>
</div>
In a div
.
http://www.commoncurriculum.com/ font-face
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
width="200px" height="200px" viewBox="0 0 200 200" enable-background="new 0 0 200 200" xml:space="preserve">
<font horiz-adv-x="1000">
<!-- Copyright 2005 by Just Another Foundry. All rights reserved. -->
<!-- Copyright: Copyright 2014 Adobe System Incorporated. All rights reserved. -->
<font-face font-family="Facit-Bold" units-per-em="1000" underline-position="-300" underline-thickness="150"/>
Inline SVG.
http://www.snc.edu/ caption-under
<p style="font-size:.688em; padding:5px 15px 0px 15px;"><caption-under/></p>
In a p
.
http://www.electronicbeats.net/ soundcloud-player
<!-- <soundcloud-player/> -->
In a comment.
http://www.cqsq.com/ pw-end
<pw-end/>
Near the end of the page.
http://www.cordobacf.com/ dynamic-element
<root>
<dynamic-element name="principal" type="text" index-type="" repeatable="false">
<dynamic-element name="link" type="text" index-type="" repeatable="false"></dynamic-element>
<dynamic-element name="image" type="document_library" index-type="" repeatable="false"/>
</dynamic-element>
<dynamic-element name="secundario" type="text" index-type="" repeatable="true">
<dynamic-element name="link" type="text" index-type="" repeatable="false"></dynamic-element>
<dynamic-element name="image" type="document_library" index-type="" repeatable="false"/>
</dynamic-element>
</root>
(Prettified here.) OK this one is interesting, because it nests dynamic-element
and uses />
for some inner ones. If self-closing is supported, the tags are balanced. Changing this to use explicit close tags does not change the rendering of the page.
http://www.rajasthantourism.gov.in/ defanged-meta
Page changed?
http://www.cpp.edu/ system-region
<!-- Original FEATURES SECTIONS
<div class="features">
<div class="row">
<system-region name="UNIVERSITY NEWS"/>
<system-region name="UPCOMING EVENTS"/>
<div class="span4">
<system-region name="FEATURED SITES"/>
<system-region name="LOCATION DIRECTIONS"/>
</div>
</div>
</div>
-->
In a comment.
http://www.rvusa.com/ angucomplete-alt
<div class="form-group">
<label for="BrandModel">Make or Model:</label><br />
<angucomplete-alt id="ex1"
...
input-class="form-control input-padding" />
</div>
In a div
http://www.swidnik.pl/ l-ink
<!--[if IE 6]><l-ink rel="stylesheet" href="/templates/swidnik10/css/template.ie6.css" type="text/css" media="screen" /><![endif]-->
<!--[if IE 7]><l-ink rel="stylesheet" href="/templates/swidnik10/css/template.ie7.css" type="text/css" media="screen" /><![endif]-->
In conditional comments. l-ink
is probably deliberate "comment out".
http://www.globalsafelist.com/ COOKIE-INCLUDE
<META>
<POLICY-REFERENCES>
<POLICY-REF about="/w3c/p3p.xml">
<INCLUDE></INCLUDE>
<COOKIE-INCLUDE/>
</POLICY-REF>
</POLICY-REFERENCES>
</META>
In another element.
http://www.snapgame.io/ ng-include
<ng-include src="mainUrl"/>
At the end of the page.
http://www.dominos.co.il/ bad-code
Page changed?
http://www.pointpark.edu/ cmp-navigation
<!--<cmp-navigation name="Nav Top" />-->
In a comment.
http://www.carisbo.it/ content-type
Page changed?
http://www.dinheirovivo.pt/ clickly-check
<clickly-check token='660da9bb'/>
Near the end of the page.
http://www.power2.ir/ m-eta
<m-eta name="google-site-verification" content="aYkd6dxnNGdj6cDx-ikCx1YLw96YViseYPiIc1crhTg" />
Probably deliberate "comment out".
<noscript><a
target=_top href="http://top.mail.ru/jump?from=1091089"><FONT size=1>[AD]</FONT><AD- .top.list.ru/counter?js=na;id=1091089;t=211"
border=0 height=31 width=88
alt="Рейтинг@Mail.ru"/></a></noscript>
In a
in noscript
.
http://www.pokerstars.gr/ rg-account
<div class="col-xs-8 col-sm-3 col-md-4 text-right formCell">
<rg-account class="ng-app:ram" ng-app="ram"/>
</div>
In a div
.
http://www.bcgov.net/ system-region
<!--<div id="alert"><a href="http://www.bcgov.net/departments/Real-Property-Services/Treasurer/index.php">If you received two motor vehicle bills within a short period of time, click here.</a></div>
<div id="alert" align="center"><system-region name="GLOBALALET" /></div>-->
In a div
in a comment.
http://www.readersdigest.ca/ uses-sdk
<uses-sdk android:minSdkVersion="3" android:targetSdkVersion="8" />
In head
. Adding a close tag does not change the rendering.
Page changed?
http://www.royalsundaram.in/ br-
<input type="radio" class="rdactive" name="Health-Insurance" id="<b>Lifeline</b>---A-holistic-indemnity-plan-that-offers<br-/>cover-from-Rs.-2-Lacs-to-Rs.-1.5-Crore" value="javascript:_gaq.push(['pageTracker._trackEvent','Homepage','Lifeline ','Calculate Premium']); | /health-insurance/life-line/calculate-premium.aspx$^*javascript:_gaq.push(['pageTracker._trackEvent','Homepage',' Lifeline ','Know More']); | /health-insurance/life-line.aspx" title="<b>Lifeline</b> - A holistic indemnity plan that offers<br />cover from Rs. 2 Lacs to Rs. 1.5 Crore" onclick="hmeradiochange('Health-Insurance');" checked="checked" /><label for="<b>Lifeline</b>---A-holistic-indemnity-plan-that-offers<br-/>cover-from-Rs.-2-Lacs-to-Rs.-1.5-Crore">
Inside an attribute value...
http://www.gazzettaufficiale.it/ put-attribute
<div class="crediti">
<put-attribute name="content" value="/WEB-INF/views/page/statiche/gucertificata.jspf"/>
<div class="loghi_crediti">
<img src="/resources/img/logo_mef.png" alt="mef" />
<img src="/resources/img/logo_ipzs.png" alt="Istituto Poligrafico e Zecca dello Stato"/>
</div>
<div class="clear"></div>
</div>
Adding a close tag doesn't change the rendering.
http://www.locutorkiko.com.br/ meta-data
<application android:label="@string/djleomt">
<meta-data android:name="com.facebook.sdk.ApplicationId" android:value="@string/djleomt"/>
</application>
In another element.
http://www.macomb.edu/ system-region
<!--<ul class="social">
<system-region name="SOCIAL LINKS" />
<system-region name="SEARCH" />
<system-region name="QUICKLINKS" />
</ul>-->
In ul
in a comment.
http://www.bancacrfirenze.it/ content-type
Page changed?
The <script>
tag when used with a src
attribute bears mentioning as well. That's always been an inconsistency compared to external styles which can be injected with a void <link>
tag.
@nlwillia you can mention it but we can't change script
.
@zcorpan I'd be very wary of analyses that only look at home pages (they tend to have better markup than deeply nested pages, because they're more important), but if you can get a browser to do it, let's try it and see how it goes.
A voice from Angular: Angular has uses a special element called <ng-content>
to do content projection. Right now, users have to do <ng-content></ng-content>
which is awkward as it will never have children. Allowing custom elements to be self closing would be really great for this!
This seems like an improper close. The comment you link talks about seeing if we end up with "workarounds in libraries" to show whether or not this ends up being necessary, but you can't work around this issue in libraries.
You could if you parse the HTML yourself. The problem is that implementers do not want to change the parser. The only option that seems somewhat palatable is having a flag at the start of the document to opt into a simplified tree builder that only has a couple of insertion modes rather than several dozen.
AFAIK the XHTML parser already supports that, but I agree with others, specially <custom elements />
are super awkward to write right now, so that all my HTML template-literals based libraries need to do this transformation via RegExp, which has been proven to be solid, to date, but ... you know, it's a RegExp, not an option in the parser.
As example, this kind of layout works without issues, yet I'd love to see this available on the Web:
html`<span class="decoration" />`
The obvious footgun that has been around forever is also the fact when a developer explicitly closes a tag, it would never expect this to contain adjacent nodes:
<!-- you write this -->
<div class="lazy" />
<div>not lazy</div>
<!-- you obtain -->
<div class="lazy">
<div>not lazy</div>
</div>
and whatever library will populate .lazy
later on will likely destroy the not lazy content in doing so.
Please don't close this issue, thanks.
Proposal: The current HTML standard allows the end tag to be omitted when an element is one of a specific list of void elements. The start tag of a void element may be written as
<name />
in which case the slash is effectively ignored. The void element list is useful since it helps a parser identify when not to expect an end tag, but conflating self-closing syntax with void (and foreign) elements is not. An HTML parser should recognize self-closing syntax on any element and interpret it as equivalent to<name></name>
.Motivation: The ability to use self-closing tags more broadly is not strictly necessary in HTML itself since, in general, elements written without content (ex:
<p></p>
) have no semantic purpose. However, custom elements are being introduced today by frameworks like Angular, Aurelia and others as well as web components that are semantically void, but which the base parser will have no knowledge of. This requires the element to be written as<name></name>
which is more verbose and ambiguous.<name />
in contrast is brief, expressive and assertive about the fact that the element is being used declaratively and has no content. Unless there is a mechanism to configure parsing by extending the void element list itself (which would be more consistent since it would allow just<name>
, but probably more challenging), then just expanding self-closed tags seems like the simplest way to create a practical foundation for custom elements (which are an important part of HTML's evolution) to have greater economy and expressiveness.Implications: Lexically, this would be a modest change, but handling by browsers that were not updated to track it would vary. It's not uncommon to see confusion from new custom element developers when they self-close a declarative custom element out of habit in a framework where the native HTML parser is used and adjacent tags or content are eaten as a result. It's likely that even with standardization it would be some time before native support was widespread enough to assume. In the interim, pre-processing could be applied to template content to expand self-closed custom tags. The argument can be made that the option of custom parsing obviates the need for the standard to change, but frameworks like to track the standard and are reluctant to introduce such measures independently despite the benefits.
Background: This is my own opinion as a developer. I'm not associated with a browser maker or major company or framework. I've researched this issue, but I haven't found where it was clearly ruled out or there was much hope of it being fixable downstream. It may be more appropriate as a web component discussion, but the general impression I get is that people are looking to the HTML standard first. I'm fine with HTML not being XML, but in this case, I think that the self-closing syntax is a worthwhile improvement for consideration.