speced / respec

A tool for creating technical documents and web standards
https://respec.org/
Other
720 stars 389 forks source link

Remove heading role #370

Closed LJWatson closed 8 years ago

LJWatson commented 9 years ago
elements have role="heading" applied. It's redundant (the role is implicit), and it's breaking the heading structure with Jaws (bug filed with FS too).
halindrome commented 9 years ago

While it is redundant, it is required so that RDFa processors can create appropriate triples about the headers (there are implicit roles for RDFa). I am not really inclined to break semantic processing of specs in order to work around a bug in some other tool.

stevefaulkner commented 9 years ago

While it is redundant, it is required so that RDFa processors can create appropriate triples about the headers (there are implicit roles for RDFa). I am not really inclined to break semantic processing of specs in order to work around a bug in some other tool.

So we have a user of what respec produces being negatively effected for the sake of RDFA processors? Sounds like this has priroity of constituencies back to fornt.

darobin commented 9 years ago

@halindrome If RDFa processors can't tell that a heading is a heading just by knowing that it's a heading element, that strikes me as clearly a bug in RDFa processors. Also, as @stevefaulkner points out, users matter more than metadata extraction tools!

halindrome commented 9 years ago

RDFa processors know nothing about elements. the (a) in the name stands for 'attributes'. I agree the end users take priority, but I am sure you would agree that we should not dumb down our specifications in order to work around bugs in user agents. For example, we now leave html5 elements in the specs despite the fact that many (most) end users do not have html5-aware user agents.

Before we remove valuable semantics, I would prefer to explore work arounds. @LjWatson can you confirm that if there were an aria-level attribute on the element as well JAWS would handle it correctly?

darobin commented 9 years ago

I don't think that it's about dumbing down UAs. If RDFa does something useful with ARIA roles then it should take implied semantics into account. If an RDFa processor understands role=heading but does not infer it on <h2> without role, I contend that it is pretty broken.

LJWatson commented 9 years ago

@Shane are headings the only elements the processor uses/is interested in?

@LeonieWatson Carpe diem

From: Shane McCarron [mailto:notifications@github.com] Sent: 10 December 2014 17:08 To: w3c/respec Cc: Léonie Watson Subject: Re: [respec] Remove heading role (#370)

RDFa processors know nothing about elements. the (a) in the name stands for 'attributes'. I agree the end users take priority, but I am sure you would agree that we should not dumb down our specifications in order to work around bugs in user agents. For example, we now leave html5 elements in the specs despite the fact that many (most) end users do not have html5-aware user agents.

Before we remove valuable semantics, I would prefer to explore work arounds.

— Reply to this email directly or view it on GitHub https://github.com/w3c/respec/issues/370#issuecomment-66485821 . https://github.com/notifications/beacon/ADVr6b7AzrVxvatsbX9S5qzBcbmeh4Gmks5nWHVUgaJpZM4DGqhO.gif

halindrome commented 9 years ago

Not at all. The Role Attribute specification says that triples are created for all values of role that satisfy some requirements. The processor is not interested in ANY elements. It is interested in attributes.

gkellogg commented 9 years ago

Just to echo Shane’s assertions, RDFa (and, indeed, HTML Microdata) really just operate on attributes and are unaware of HTML semantics. This allows RDFa to be used on XML and SVG with great interoperability. To say that a spec is broken because it doesn’t understand HTML(5) semantics is, IMO, a bit hyperbolic. Certainly, a hypothetical processor could infer semantics from HTML elements and attributes. For example, @alt on an <img> in addition to @src, but this would be a requirement for some new specification, which isn’t on anyone’s plate AFAIK.

(The sole exception to this is the <time> element, where both understand @datetime and element content, but only when @property is also present).

(Also, note that @role is not really part of RDFa per-se, but leverages the RDFa specifications to provide similar behavior).

A workaround might be to use @property="xhv:role" @resource="xhv:heading", which should produce the same triples, but this certainly isn't as elegant as the @role attribute.

halindrome commented 9 years ago

Thanks for chiming in @gkellogg! I agree that your use or property would work. Unfortunately, it would not work on, for example, notes and issues. These also have "headings" and get a role attribute applied to them but no H* element (it is on a div). That role attribute has the happy side effect of providing information for ATs AND RDFa. Yay for standards! I am not quite sure what would happen if there was property, resource, and role on an element. It might result in multiple identical triples. Which would be fine, but weird.

darobin commented 9 years ago

I'm sorry but there are two problems here.

The first, which is outside the scope of what ReSpec can fix but is worth pointing out nevertheless, is that if RDFa uses roles but does not understand implicit semantics then I fail to see how it's not broken. It's like saying "We don't care about element, only attributes, so we only recognise href attributes when they contain absolute URLs. We don't understand that a may use relative references or that base may change the base against which to reference — we only care about attributes." It's not up to ReSpec to fix RDFa, but working around spec bugs in RDFa isn't a priority that trumps that many things.

Which ties in to the second point: accessibility trumps semantics. Even if RDFa were right in ignoring available semantic information (which strikes me as odd to start with) and even if Jaws were radically wrong in its interpretation, people who read specs are more important than applications that read specs.

More fundamentally, what worries me here is that if there were a bug in ReSpec that caused Chrome or Firefox to break the visual rendering of specs it would get fixed immediately no matter which other feature that required pruning. What's the thinking that could possibly justify treating this case any differently?

halindrome commented 9 years ago

Let's set aside what the Role Attribute spec (which is related to but NOT the same as the role attribute in HTML5) does with regard to RDFa. As you correctly point out it is not something relevant to ReSpec.

Of course accessibility trumps semantics. But if there is a way to instrument the source of a document such that both goals are achieved, surely that is more desirable than a solution where only one is achieved? It is why I asked if adding aria-level would solve the problem. We should wait for a response before making a decision.

And yes, you are correct that if rendering were broken for a significant portion of the market it would get fixed immediately. That's not the case here. Rendering, visual or otherwise, is not broken. Jaws is incorrectly interpreting the heading structure of the document. It is doing this in EVERY RESPEC DOCUMENT THAT HAS BEEN PUBLISHED IN THE LAST COUPLE OF YEARS. Clearly no one has ever noticed, and we have a lot of people reading ReSpec generated documents.

Again, I am just trying to get all the data before we make a decision. I don't want to throw out the semantic baby with the bath water.

stevefaulkner commented 9 years ago

EVERY RESPEC DOCUMENT THAT HAS BEEN PUBLISHED IN THE LAST COUPLE OF YEARS. Clearly no one has ever noticed, and we have a lot of people reading ReSpec generated documents.

This statement false, someone i.e. @LjWatson has noticed and it has negatively effected their understanding of the document structure, hence the raising of the issue.

halindrome commented 9 years ago

I meant statistically. On Dec 10, 2014 4:48 PM, "stevefaulkner" notifications@github.com wrote:

EVERY RESPEC DOCUMENT THAT HAS BEEN PUBLISHED IN THE LAST COUPLE OF YEARS. Clearly no one has ever noticed, and we have a lot of people reading ReSpec generated documents.

This statement false, someone i.e. @LjWatson https://github.com/LjWatson has noticed and it has negatively effected their understanding of the document structure, hence the raising of the issue.

— Reply to this email directly or view it on GitHub https://github.com/w3c/respec/issues/370#issuecomment-66539282.

sideshowbarker commented 9 years ago

A workaround might be to use @property="xhv:role" @resource="xhv:heading", which should produce the same triples, but this certainly isn't as elegant as the @role attribute.

It doesn't seem like the use of the role attribute can be considered very elegant at all in the case of using it with HTML h1-h6 elements if in practice that degrades user experience for AT users and amounts to just adding redundant per-element markup that's unnecessarily re-expressing existing already-universally-known HTML native semantics.

These also have "headings" and get a role attribute applied to them but no H* element (it is on a div). That role attribute has the happy side effect of providing information for ATs AND RDFa.

That's a totally different case and I'd think that shouldn't be conflated with the case of adding role=heading markup to HTML h1-h2 elements. All AT already understands HTML h1-h6 to be headings and they are otherwise also pretty much universally understood to be headings.

RDFa processors know nothing about elements.

Is that completely true? Unless I'm misunderstanding earlier comments here, it seems like the HTML <time> element is an exception to that. If so, it's not clear why RDFa processors couldn't have similar exceptions for HTML h2-h6 elements. Or for other HTML elements too.

Just to echo Shane’s assertions, RDFa (and, indeed, HTML Microdata) really just operate on attributes and are unaware of HTML semantics.

That's not completely true about Microdata, right? As far as I understand at least, the Microdata spec in fact does specify quite a bit of HTML-semantics-aware processing—at least to the degree of defining how to determine property values for particular HTML elements:

https://html.spec.whatwg.org/multipage/microdata.html#values

That part of the Microdata spec defines how processors determine property values for the HTML meta, audio, embed, iframe, img, source, track, video, a, area, link, object, data, meter, and time elements.

To say that a spec is broken because it doesn’t understand HTML(5) semantics is, IMO, a bit hyperbolic. Certainly, a hypothetical processor could infer semantics from HTML elements and attributes.

I don't think such processors are hypothetical. That's what other classes of processors already do for HTML documents. It's not like HTML is some obscure vocabulary with arbitrary unknown semantics…

(The sole exception to this is the

So RDFa tools in fact implement an exception to the "RDFa processors know nothing about elements" rule to recognize the HTML <time> element as representing a time?

If so then I'd think a possible solution is to add similar exceptions for HTML h1-h6 elements—and maybe for other HTML elements too, as the Microdata spec seems to do. Are there technical reasons why doing that for h1-h6 wouldn't be possible while it's instead possible for the <time> case? Or some other subtlety I'm missing here?

Certainly, a hypothetical processor could infer semantics from HTML elements and attributes… but this would be a requirement for some new specification, which isn’t on anyone’s plate AFAIK.

I'd think it ought to be on somebody's plate—somebody who wants to make RDFa more useful and more elegant in practice for HTML authors, and at least for the case of the HTML h1-h6 elements.

Otherwise, lacking that, it seems like the problem and costs are being unfairly shifted to Web document authors and developers—by expecting them to spend extra time putting additional markup in their HTML docs to provide redundant semantic/structural data that's already expressed clearly by the HTML itself.

(Also, note that @role is not really part of RDFa per-se, but leverages the RDFa specifications to provide similar behavior).

If the behavior of RDFa tools with regard to the role attribute isn't defined in the RDFa spec themselves, then I'd think that, however and wherever it is actually defined (in the spec for the role attribute?), you could define handling for HTML h1-h6 elements in a similar way. And other HTML elements too.

Wouldn't this be an appropriate refinement to add in an updated version of the HTML+RDFa 1.1 spec?

halindrome commented 9 years ago

Some points:

1) The interpretation of @role vis. RDFa is defined in the Role Attribute specification. It is an 'extension' to the RDFa specification, much as the longdesc recommendation is an extension to HTML5.

2) While in theory we could re-open the role attribute spec and add intelligence for all the HTML elements and their implied roles, that would really only serve to generate thousands of garbage triples. By requiring the use of explicit attribute declarations, the Role Attribute Recommendation allows content authors to control the triples that are generated to be just the ones that are semantically meaningful.

And while this is an interesting discussion, it is somewhat orthogonal to the original point. Which is that an implementation is performing suboptimally when presented with perfectly legal content. That's a bug. Let's see if there is a work around that will make that implementation perform better. If there is, and it isn't ridiculous, great. If there is not, also great. Remove the semantic markup.

On Wed, Dec 10, 2014 at 9:10 PM, Michael[tm] Smith <notifications@github.com

wrote:

A workaround might be to use @property https://github.com/property="xhv:role" @resource https://github.com/resource="xhv:heading", which should produce the same triples, but this certainly isn't as elegant as the @role https://github.com/role attribute.

It doesn't seem like the use of the role attribute can be considered very elegant at all in the case of using it with HTML h1-h6 elements if in practice that degrades user experience for AT users and amounts to just adding redundant per-element markup that's unnecessarily re-expressing existing already-universally-known HTML native semantics.

These also have "headings" and get a role attribute applied to them but no H* element (it is on a div). That role attribute has the happy side effect of providing information for ATs AND RDFa.

That's a totally different case and I'd think that shouldn't be conflated with the case of adding role=heading markup to HTML h1-h2 elements. All AT already understands HTML h1-h6 to be headings and they are otherwise also pretty much universally understood to be headings.

RDFa processors know nothing about elements.

Is that completely true? Unless I'm misunderstanding earlier comments here, it seems like the HTML

Just to echo Shane’s assertions, RDFa (and, indeed, HTML Microdata) really just operate on attributes and are unaware of HTML semantics.

That's not completely true about Microdata, right? As far as I understand at least, the Microdata spec in fact does specify quite a bit of HTML-semantics-aware processing—at least to the degree of defining how to determine property values for particular HTML elements:

https://html.spec.whatwg.org/multipage/microdata.html#values

That part of the Microdata spec defines how processors determine property values for the HTML meta, audio, embed, iframe, img, source, track, video, a, area, link, object, data, meter, and time elements.

To say that a spec is broken because it doesn’t understand HTML(5) semantics is, IMO, a bit hyperbolic. Certainly, a hypothetical processor could infer semantics from HTML elements and attributes.

I don't think such processors are hypothetical. That's what other classes of processors already do for HTML documents. It's not like HTML is some obscure vocabulary with arbitrary unknown semantics…

(The sole exception to this is the element, where both understand @datetime https://github.com/datetime and element content, but only when @property https://github.com/property is also present).

So RDFa tools in fact implement an exception to the "RDFa processors know nothing about elements" rule to recognize the HTML

If so then I'd think a possible solution is to add similar exceptions for HTML h1-h6 elements—and maybe for other HTML elements too, as the Microdata spec seems to do. Are there technical reasons why doing that for h1-h6 wouldn't be possible while it's instead possible for the

Certainly, a hypothetical processor could infer semantics from HTML elements and attributes… but this would be a requirement for some new specification, which isn’t on anyone’s plate AFAIK.

I'd think it ought to be on somebody's plate—somebody who wants to make RDFa more useful and more elegant in practice for HTML authors, and at least for the case of the HTML h1-h6 elements.

Otherwise, lacking that, it seems like the problem and costs are being unfairly shifted to Web document authors and developers—by expecting them to spend extra time putting additional markup in their HTML docs to provide redundant semantic/structural data that's already expressed clearly by the HTML itself.

(Also, note that @role https://github.com/role is not really part of RDFa per-se, but leverages the RDFa specifications to provide similar behavior).

If the behavior of RDFa tools with regard to the role attribute isn't defined in the RDFa spec themselves, then I'd think that, however and wherever it is actually defined (in the spec for the role attribute?), you could define handling for HTML h1-h6 elements in a similar way. And other HTML elements too.

Wouldn't this be an appropriate refinement to add in an updated version of the HTML+RDFa 1.1 spec?

— Reply to this email directly or view it on GitHub https://github.com/w3c/respec/issues/370#issuecomment-66563043.

Shane McCarron halindrome@gmail.com

gkellogg commented 9 years ago

On Dec 10, 2014, at 7:10 PM, Michael[tm] Smith notifications@github.com wrote:

A workaround might be to use @property https://github.com/property="xhv:role" @resource https://github.com/resource="xhv:heading", which should produce the same triples, but this certainly isn't as elegant as the @role https://github.com/role attribute.

It doesn't seem like the use of the role attribute can be considered very elegant at all in the case of using it with HTML h1-h6 elements if in practice that degrades user experience for AT users and amounts to just adding redundant per-element markup that's unnecessarily re-expressing existing already-universally-known HTML native semantics.

These also have "headings" and get a role attribute applied to them but no H* element (it is on a div). That role attribute has the happy side effect of providing information for ATs AND RDFa.

That's a totally different case and I'd think that shouldn't be conflated with the case of adding role=heading markup to HTML h1-h2 elements. All AT already understands HTML h1-h6 to be headings and they are otherwise also pretty much universally understood to be headings.

RDFa processors know nothing about elements.

Is that completely true? Unless I'm misunderstanding earlier comments here, it seems like the HTML

I did mention the time element, but this is restricted to the content model. Triples aren’t generated unless there’s an @property attribute (@itemprop for microdata). Just to echo Shane’s assertions, RDFa (and, indeed, HTML Microdata) really just operate on attributes and are unaware of HTML semantics.

That's not completely true about Microdata, right? As far as I understand at least, the Microdata spec in fact does specify quite a bit of HTML-semantics-aware processing—at least to the degree of defining how to determine property values for particular HTML elements:

https://html.spec.whatwg.org/multipage/microdata.html#values https://html.spec.whatwg.org/multipage/microdata.html#values That part of the Microdata spec defines how processors determine property values for the HTML meta, audio, embed, iframe, img, source, track, video, a, area, link, object, data, meter, and time elements.

Sure, it uses HTML DOM functions for getting element content. RDFa ends up doing the same thing, but using different processing rules. When there’s a @property, @rel, or @rev attribute on an element, the rules for extracting the content are based on attributes specific to those elements. Microdata is similar, but calls on DOM functions to get these values. It ends up being the same thing (with minor differences at the extremes of URL processing).

The point is that processors don’t generate triples unless it is specifically called for, via @property or @itemprop.

To say that a spec is broken because it doesn’t understand HTML(5) semantics is, IMO, a bit hyperbolic. Certainly, a hypothetical processor could infer semantics from HTML elements and attributes.

I don't think such processors are hypothetical. That's what other classes of processors already do for HTML documents. It's not like HTML is some obscure vocabulary with arbitrary unknown semantics…

Certainly, there are many processors that handle HTML documents, just not that generate RDF. Other than RDFa and Microdata, GRDDL may still be around, but it’s pretty obsolete right now. In particular, there are no standards for generating triples from semantic elements in HTML other than ReSpec itself (at least that I’m aware of). Indeed, most people aren’t looking to extract generic structure from HTML documents in general, but want to embed domain knowledge about the subject of the document, such as Contact and Event information for SEO purposes, or more complicated information.

The RDFa information added to ReSpec documents really is a case about being able to semantically describe the document itself, so it is a special case. Certainly something could be written to do this based on HTML semantic elements, but it makes sense to me to use existing standards to accomplish this.

(The sole exception to this is the element, where both understand @datetime https://github.com/datetime and element content, but only when @property https://github.com/property is also present).

So RDFa tools in fact implement an exception to the "RDFa processors know nothing about elements" rule to recognize the HTML

If so then I'd think a possible solution is to add similar exceptions for HTML h1-h6 elements—and maybe for other HTML elements too, as the Microdata spec seems to do. Are there technical reasons why doing that for h1-h6 wouldn't be possible while it's instead possible for the

Both RDFa and microdata parsers will extract element content, with some element-specific semantics, when called upon due to an @property or @itemprop attribute. Either the value of an attribute, or the element’s contents. In the case of the time element, and the @datetime attribute, this is an accomodation to remove the need to explicitly datatype that element. In the case of microdata, this is the only way to get typed literals out (other than recent changes for data and meter elements). If there were other elements with particular content models, these could certainly be considered for some future version of RDFa, but the need is not as great due to the availability of the @datatype attribute.

As Shane pointed out, always generating RDF output for h1-h6 elements, or other elements, would not serve the needs of the general community who want to be more specific in RDF that can be extracted from a page.

Certainly, a hypothetical processor could infer semantics from HTML elements and attributes… but this would be a requirement for some new specification, which isn’t on anyone’s plate AFAIK.

I'd think it ought to be on somebody's plate—somebody who wants to make RDFa more useful and more elegant in practice for HTML authors, and at least for the case of the HTML h1-h6 elements.

Otherwise, lacking that, it seems like the problem and costs are being unfairly shifted to Web document authors and developers—by expecting them to spend extra time putting additional markup in their HTML docs to provide redundant semantic/structural data that's already expressed clearly by the HTML itself.

(Also, note that @role https://github.com/role is not really part of RDFa per-se, but leverages the RDFa specifications to provide similar behavior).

If the behavior of RDFa tools with regard to the role attribute isn't defined in the RDFa spec themselves, then I'd think that, however and wherever it is actually defined (in the spec for the role attribute?), you could define handling for HTML h1-h6 elements in a similar way. And other HTML elements too.

Wouldn't this be an appropriate refinement to add in an updated version of the HTML+RDFa 1.1 spec?

For the reasons cited above, I don’t think so, but there could be other ways of adding structural information about W3C specs, or others wanting to extract semantic information from HTML documents. I think the work Shane’s done in ReSpec is a good start, but adding the information in a script tag holding JSON-LD could be another way to accomplish something similar. Perhaps the best way forward is just to use the @property and @resource attributes as I suggested before.

Gregg

P.S., at this point follow-ups maybe should go to semantic-web@w3.org mailto:semantic-web@w3.org, or some similar list more interested in the particulars of extracting semantic information from HTML documents.

— Reply to this email directly or view it on GitHub https://github.com/w3c/respec/issues/370#issuecomment-66563043.

sideshowbarker commented 9 years ago

Just as far as dealing with the case of respec behavior for h1-h6 elements: In the interest of finding some agreement on this issue, and given these statements:

A workaround might be to use @property="xhv:role" @resource="xhv:heading" [with h1-h6 elements], which should produce the same triples

and

I agree that your use of property would work [with h1-h6 elements].

…I'd like to ask (A) if it would be technically possible to resolve this by changing respec's current behavior for h1-h6 elements such that instead of adding role=heading to h1-h6 elements, respec instead adds property="xhv:role" + resource="xhv:heading" to them, and (B) if everybody listening here could live with such a resolution.

LJWatson commented 9 years ago

@Halindrome: "And yes, you are correct that if rendering were broken for a significant portion of the market it would get fixed immediately."

Reducing this to a statistical numbers game is unhelpful. A processor can be reconfigured, a person can't.

@Halindrome: "That's not the case here. Rendering, visual or otherwise, is not broken. Jaws is incorrectly interpreting the heading structure of the document."

Jaws is not mis-interpreting the heading structure. It's being informed that the headings have no structure. The role effectively turns the headings into <h>, which is not a valid element in HTML AFAIK.

@Halindrome: "It is doing this in EVERY RESPEC DOCUMENT THAT HAS BEEN PUBLISHED IN THE LAST COUPLE OF YEARS. Clearly no one has ever noticed, and we have a lot of people reading ReSpec generated documents."

It seems that commits in the last few months broke the semantics:

https://github.com/w3c/respec/pull/325

@Sideshowbarker: "…I'd like to ask (A) if it would be technically possible to resolve this by changing respec's current behavior for h1-h6 elements such that instead of adding role=heading to h1-h6 elements, respec instead adds property="xhv:role" + resource="xhv:heading" to them, and (B) if everybody listening here could live with such a resolution."

If it can be accomplished technically it sounds like a good solution. The processor gets its attribute, the semantics of the headings remain valid, and authors don't have to do any extra work.

halindrome commented 9 years ago

@LjWatson: "Jaws is not mis-interpreting the heading structure. It's being informed that the headings have no structure. The role effectively turns the headings into <h>, which is not a valid element in HTML AFAIK."

Interesting. I would actually like to explore this further. Maybe we can get it on the agenda in PF? I don't see how specifying a role of heading overrides the (other) inherent semantics of an H2 element, for example. Not really relevant in this discussion though.

@LjWatson: "It seems that commits in the last few months broke the semantics:

https://github.com/w3c/respec/pull/325"

Ahh - that change. Doh! So if I understand you correctly, re-introducing support for aria-level (using the correct level numbers in deference to you @stevefaulkner) would be one way of addressing this.

Another way that might work would be using @property and @resource. I am going to experiment with that on a local copy to ensure that the correct triples are generated. I will report back.

halindrome commented 9 years ago

Actually, I note that the HTML5 recommendation specifically states that the role value on an H* element should not be set to heading as it is the default. Not sure how I never noticed that. I am sort of surprised the documents we generate pass validation.

Anyway, still experimenting. Thanks for your patience.

darobin commented 9 years ago

It would seem that the nu validators catches a bunch of issues (though indeed not that one) https://validator.nu/?doc=http%3A%2F%2Fwww.w3.org%2FTR%2Fvibration%2F&schema=http%3A%2F%2Fs.validator.nu%2Fhtml5-rdfalite.rnc+http%3A%2F%2Fs.validator.nu%2Fhtml5%2Fassertions.sch+http%3A%2F%2Fc.validator.nu%2Fall%2F

halindrome commented 9 years ago

Yeah - we don't use RDFa Lite. We use RDFa.

On Thu, Dec 11, 2014 at 11:17 AM, Robin Berjon notifications@github.com wrote:

It would seem that the nu validators catches a bunch of issues (though indeed not that one) https://validator.nu/?doc=http%3A%2F%2Fwww.w3.org%2FTR%2Fvibration%2F&schema=http%3A%2F%2Fs.validator.nu%2Fhtml5-rdfalite.rnc+http%3A%2F%2Fs.validator.nu%2Fhtml5%2Fassertions.sch+http%3A%2F%2Fc.validator.nu%2Fall%2F

— Reply to this email directly or view it on GitHub https://github.com/w3c/respec/issues/370#issuecomment-66653826.

Shane McCarron halindrome@gmail.com

sideshowbarker commented 9 years ago

Actually, I note that the HTML5 recommendation specifically states that the role value on an H* element should not be set to heading as it is the default. Not sure how I never noticed that. I am sort of surprised the documents we generate pass validation.

That's my fault. The validator code currently just isn't up to date yet with that spec requirement. But I'm planning to update it soon to emit a message for this case (and, per-spec, for all other cases where the role attribute is used with elements that already have strong implicit semantics).

sideshowbarker commented 9 years ago

It would seem that the nu validators catches a bunch of issues (though indeed not that one) https://validator.nu/?doc=http%3A%2F%2Fwww.w3.org%2FTR%2Fvibration%2F&schema=http%3A%2F%2Fs.validator.nu%2Fhtml5-rdfalite.rnc+http%3A%2F%2Fs.validator.nu%2Fhtml5%2Fassertions.sch+http%3A%2F%2Fc.validator.nu%2Fall%2F

Right, those messages are all there intentionally and are going to remain there.

Yeah - we don't use RDFa Lite. We use RDFa.

The consequence of that choice is that all other (non-W3C) instances of checker tools that use the validator.nu code to check respec documents are by design going to emit error messages for any non-Lite RDFa attributes. And they're going to continue doing that.

And that's not a hypothetical concern, because there's a growing list of such checker tools; e.g., the grunt plugin for HTML checking https://github.com/jzaefferer/grunt-html and tools like https://github.com/svenkreiss/html5validator that facilitate integration of HTML checking with TravisCI, and tools like the Bootstrap project's Let Me Validate That For You https://github.com/cvrebert/lmvtfy/ and so on. I'm certain we'll see more coming along, because I'm trying to make it easy as possible for others to integrate the code into their own applications, and provide a variety of useful services based on it.

But all those existing services and all the ones that will emerge are going to report errors for all non-Lite RDFa markup—because by design the default validator.nu code does not support non-Lite RDFa, and so the packaged Nu Markup Checker releases which all those services rely do not support non-Lite RDFa.

The only reason I added any RDFa support at all to the validator.nu code to begin with was after getting enough reports from users about error messages caused by Facebook's open graph (or whatever it's called) stuff, and talking with Henri about how to deal with it and us finally deciding that it was at the point where we'd be helping users more by not reporting errors for those limited cases. But then schema.org's RDFa Lite support came along around that same time. So I bit the bullet and added RDFa Lite support.

If it'd been up to me, I'd have stopped there, because as far as checking of RDFa in HTML documents goes, support for RDFa Lite checking is all that I believe is rightly needed in practice. While I recognize there are differing opinions about the utility of full RDFa in HTML documents, and with genuine respect to others who believe full RDFa is a good thing, I happen to be one of the people who hold the opinion that it's not needed for most cases of HTML documents on the Web and should not be widely encouraged, because I really believe it's not actually helpful to the vast majority of Web authors and developers.

But despite all that, I've ended up essentially being forced against my wishes by others within the W3C to spend a significant amount of my time implementing and testing and maintaining support for full RDFa in the W3C validator anyway.

Elsewhere however I'm free to actually do what I believe is right, and there's nothing forcing me against my own judgment to spend time proliferating full RDFa support in other checkers.

So as far as checking of respec documents goes, that means all other validator.nu-based checking tools everywhere are going to continue to emit errors for any non-Lite RDFa markup the documents contain.

darobin commented 9 years ago

Indeed, I hadn't thought of that but it is going to prove problematic when W3C start deploying the automatic publisher (next month) that can ship drafts to TR on its own. It uses the Nu validator and will therefore choke here.

stevefaulkner commented 9 years ago

It uses the Nu validator and will therefore choke here.

When you say Nu validator am assuming you mean the w3c version as henri's version does not check for some conformance requirements that are W3C HTML5 specific.

darobin commented 9 years ago

Echidna (the automated publisher) will naturally use the W3C version since it's for TR validation. But it is still Nu, with just a few different options.

stevefaulkner commented 9 years ago

@darobin cool bananas, I asked because i noticed you cited the output from https://validator.nu/ earlier rather than http://validator.w3.org/nu/ so wanted to double check.

darobin commented 9 years ago

That was just because my network got blocked from using the W3C validator due to a local viral infestation...

stevefaulkner commented 9 years ago

That was just because my network got blocked from using the W3C validator due to a local viral infestation... blame tools for not towing part line... sure ;-)

halindrome commented 9 years ago

@sideshowbarker mentioned some interesting things about RDFa vs RDFa Lite above. They are not really relevant to this problem report, but I think that it would be valuable to get input from @gkellogg.

Two important things:

1) All RDFa processors are required to process all of RDFa. RDFa Lite is a subset of RDFa. Having a validator permit full RDFa syntax does not mean that authors are necessarily going to rely upon it any more than adding support for longdesc once it is a Recommendation will mean that it will be used. Authors will use whatever they want. Or in the case of RDFa, whatever schema.org shows them how to use. Copy. Paste. Done.

2) When introducing RDFa support into ReSpec, we consciously chose to use full RDFa because it was impossible to capture the semantics we wanted to capture with the limitations of RDFa Lite. However, @gkellogg may remember better than I.

Anyway, I have made a pull request for ReSpec that should close out this issue. I am just waiting for other collaborators to look it over.

msporny commented 9 years ago

A couple of high-level thoughts:

gkellogg commented 9 years ago

Ideally, the validator wouldn't say that HTML5 containing full RDFa is invalid, but might instead issue warnings about unprocessed RDFa attributes. As @halindrome and @msporny indicate, such markup shouldn't be considered invalid, but it is beyond the scope of the nu validator to do such validation.

Also, RDFa Lite is a publishing profile, not relevant to parsing the output. All conformant RDFa processors are required to parse the full-on RDFa. The main reason was to allow publishers an easy route into RDFa, and obviously address concerns about relative complexity vis-a-vis microdata. The use of RDFa in ReSpec is hardly trivial. However, it is important that published specs be valid HTML.

That said, almost all RDFa generated for ReSpec could be done using RDFa Lite. The exception is the use of @inlist for editors and authors. This is used to ensure that the generated RDF properly considers the ordering of editors and authors, which many consider important.

marcoscaceres commented 9 years ago

@msporny, what do you mean exactly by:

I'm against removing full RDFa support in ReSpec because we need full RDFa support to do vocabulary documents like this in the Web Payments CGs and Credentials CGs:

So, I'm super interested to hear what are you extracting exactly and what for? (i.e., what actual real world stuff are you, or anyone, actually doing with that stuff?) When I click on the link above, I just get a text file with @base <https://web-payments.org/vocabs/commerce> .? What am I missing?

I want to really understand - and have actual evidence of - that people are using this stuff in the real world? I'm always at a loss why this stuff is in the output when I've never seen anyone ask for or use this RDF stuff from specs?

Please someone enlighten me. "I want to believe" :tm:.

msporny commented 9 years ago

When I click on the link above, I just get a text file with

And this, folks, is why you should always check your links before posting them into a discussion. @marcoscaceres, you've found a bug in @gkellogg's extractor. It should work if you dump the contents of view-source:https://web-payments.org/vocabs/commerce into http://rdf.greggkellogg.net/distiller . Set "input format" to RDFa, and output format to turtle. What you see is the machine-readable description of the vocabulary document. I use the HTML5+RDFa output to generate the vocabularies in all the other RDF syntaxes (so I can edit one document and keep the rest of them in sync).

What does this have to do with ReSpec's RDFa? Almost nothing, except that I don't want to see stuff put into ReSpec that would strip the RDFa out or someone asserting that RDFa in a Respec document is useless and the validator should mark the document as erroneous because of its use of RDFa.

As for the usefulness of the chapter/author/document info RDFa markup, you don't really get "real world apps" until you start publishing data. OpenStreetMap was a pretty dumb project until all of a sudden it wasn't (it took years for the database to get into a state that was useful). I think the same may be true for W3C docs published w/ RDFa data. For example, I'd love to be able to do a quick analysis of how many total editors/authors there have been throughout W3C history (because I believe the number is laughably small compared to the number of people those specs have affected). Average size of a W3C spec, common titles, etc. All that stuff requires a custom parser to get at today. It'd just be nice if we started publishing some good metadata along w/ the specs W3C publishes. In 10 years, who knows, we may have a ton of data whose value isn't that great. Or, maybe, someone will create an app that makes all of it make so much more sense.

This was the position RDFa, Microformats, and Microdata were in pre-schema.org. Why would anyone need such a technology, but now, the SEO folks are all over it and because of that Facebook, Google, Yandex, Yahoo!, etc. are able to build out their Knowledge Graphs as a result, which has generated each organization revenues in the tens if not hundreds of millions of dollar range.

The problem w/ asking questions like "actual evidence of people using this data in the real world" is that sometimes the value of the data isn't useful until you have some critical mass of it, and then all of a sudden it's very useful.

I know that sounds like a cop-out, but that's the answer I think is appropriate for ReSpec. It doesn't cost us much to publish structured data, and it may be of great benefit in the future. Libraries have the same sort of problem - which book or magazine is the most important to make available to the public? You don't know, so you have to make them all available.

sideshowbarker commented 9 years ago

I opened up a new issue at https://github.com/w3c/respec/issues/374 for discussion about RDFa in respec, with a concrete proposal that I hope everybody could live with.

marcoscaceres commented 8 years ago

This was solved by making RDFa opt-in