whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.13k stars 2.67k forks source link

Add `sitetitle` and `pagetitle` Elements #2468

Open patrickdark opened 7 years ago

patrickdark commented 7 years ago

Summary: There's currently no way to obvious and unambiguous method to specify the site title and page title on a page and thereby provide that information likewise unambiguously to extraction tools. Having sitetitle and pagetitle elements would address this.

I also just find it bizarre that most webpages seem to specify a site title, but we're in 2017 and there's still no obvious element for that purpose.

Details: In title elements, you will typically see:

In the page body, there's an obvious mechanism for specifying the page title, but not the site title unless the site title is also the page title. This leaves inconsistent strategies like:

sitetitle and pagetitle elements would also remove the discrepancy in markup formats used for page titles in the title element and the h1 element. The former requires plaintext and the latter allows phrasing content such as abbr. If you allow specifying the page title in both places simultaneously (i.e., once) on the server side, this difference means either restricting page titles to plaintext or doing post-processing on the title to strip it of all markup elements.

Aside from those benefits, this would also allow consistent browser UI for tabs and bookmarking. Currently, pages specify arbitrary separators such as " @ "; " - "; ": "; " at "; " on "; " | "; " « "; " —" between site title and page titles that leads to inconsistent UI.

If it's possible to specify the site title and page title unambiguously, the separator could be uniform as the UI could simply concatenate the two titles using a uniform separator. And it could choose to only display the site title or page title or display them in a specific order depending on user settings.

Both of these elements would be display: block with the same content model and allowed in both head and body with the first occurrence being the preferred title. pagetitle would be treated as equivalent to h1 for the purposes of hgroup. Both elements would allow an ignorable h1 nested within for backward compatibility. The pagetitle and sitetitle elements should be nestable with the pagetitle being assumed to be the sitetitle if the former is missing from the document. `sitetitle

I'm not sure if this proposal is complete though since it doesn't accommodate the breadcrumb or notifications cases (e.g., at The Guardian and Facebook, respectively). There'd need to be something like <breadcrumbtitles><crumb/><crumb/><breadcrumbtitles> to allow uniform separators there, though the Facebook case—preceding titles with notification counts like "(1) "—could be dealt with by allowing pagetitle elements to be affected by the CSS ::before pseudo-element.

domenic commented 7 years ago

Hi @patrickdark, thanks for your interest. Have you read https://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F ? In particular it's very important to focus on problems being solved, and on their user benefit, not on potential solutions. Let's take a step back and forget about pagetitle and sitetitle, their content models, their styling, etc. What actual problem are you having trying to code your websites that you cannot accomplish using HTML today? Better yet, what are users trying to accomplish about using the web, which they cannot today?

My suspicion is that in terms of actual user-facing issues, this is already taken care of via the title/h1 elements, the hgroup element, and the application-name meta keyword. But we'll see.

patrickdark commented 7 years ago

@domenic I've read it now. I thought this was clear enough from my proposal. However, per item 2 at the cited link: This solution was originally inspired by three issues:

(1) I, as an author, had a PHP input function that accepted a single title for use in both a title and h1 element and was having to do regular expression filtering on that single title to have it accepted by both the title and h1 elements. This can be done today by writing a code-stripping PHP utility function, though I don't think it's nice to require the execution of expensive regular expressions on every page load or otherwise needing to construct a way to cache titles in two formats.

Previously, I had performed this task with XSLT with the same issue. My solution at that time was to create a custom, rich-text (i.e., with markup) title element in non-standardized XML namespace and perform XSLT transformations based on that.

(2) I, as an author and person who instructs others on the use of HTML, don't have a clear idea what the correct markup for a site title should be and have observed that other authors, based on inspection of their website's code, don't seem to have a clear idea on that subject either.

If you're ever going to solve problems like the broken outlining model, then you need people to write correct code and they're going to be less inclined to do that if they can't figure out how to do so. (I can easily see an author specifying their site title as the foremost h1 element, for example, since this is consistent with the visual structure of many websites.)

(3) I, as a user and browser extension author who has considered writing an extension to address this issue, find it annoying that I have to routinely reverse page titles when bookmarking pages using the browser UI using the format "Site Title: Page Title" as the typical format is "Page Title - Site Title" as most authors seem to recognize that the "Page Title" must appear first for browser tab UI reasons even though this defies linguistic and sorting convention. This process should be automatable, but isn't since the browser has no way to unambiguously determine the site title and page title during the bookmarking process. (I'm conceptualizing a future, hypothetical bookmarking dialog that automatically splits the title into two fields: page title and site title and a bookmarks API that can perform operations based off those separate fields.)

Likewise, I find it annoying to have site titles in tab titles even though I recognize the necessity of those site titles being there; it would be nice to have a way to programmatically disable site titles in the browser UI without requiring users to compile a list of such titles to disable. This naturally requires that site titles and page titles be specified unambiguously (i.e., separately) and obtained through automation based on the page's own metadata.

Edit: I rewrote the last two paragraphs since the logic wasn't quite correct.