saschanaz commented 6 years ago

XHTML is dead, probably no?

marcoscaceres commented 6 years ago

We should check if there are any instances of anyone using it. But yeah, xhtml is dead and I’ve no idea if RDFa makes sense outside an xhtml context?

halindrome commented 6 years ago

RDFa has nothing to do with XHTML. It is an extension to HTML5 and is part of the HTML5 standard. It is used throughout the internet. No reason to not support it.

On Tue, Feb 13, 2018 at 7:57 AM, Marcos Cáceres notifications@github.com wrote:

We should check if there are any instances of anyone using it. But yeah, xhtml is dead and I’ve no idea if RDFa makes sense outside an xhtml context?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/respec/issues/1503#issuecomment-365273889, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfx8Fr5dglB91OK4mxaKBNHFuwO0xJZks5tUZTVgaJpZM4SDxtn .

-- Shane McCarron halindrome@gmail.com

marcoscaceres commented 6 years ago

I know we have this discussion once every two years, but I’m still not sure what “supported throughout the Internet” means? Would be great to see the spec metadata show up somewhere (e.g., in Google results or something tangible that we can see)? 👀

If something is using the data in a way that benefits the community, we should keep it. But, if not...

gkellogg commented 6 years ago

From the November 2017 Common Crawl, out of 26,271,491 domains crawled (4.6%), 1,209,430 have RDFa on them. Not as high as JSON-LD (2,685,738) or Microdata (3,743,822), but still something quite substantial.

RDFa in ReSpec is a sunk cost and allows some aspect of our technical communications to be machine understandable, we should be striving for more of this, not less.

For example, the JSON-LD 1.0 Spec generates the following triples:

@base <https://www.w3.org/TR/json-ld/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<> a <bibo:Document>,
     <w3p:REC>;
   dc:title "JSON-LD 1.0"@en;
   <bibo:chapter> <#abstract>,
     <#sotd>,
     <#references>;
   <bibo:editor> ([
       a foaf:Person;
       foaf:homepage <http://manu.sporny.org/>;
       foaf:name "Manu Sporny"@en;
       foaf:workplaceHomepage <http://digitalbazaar.com/>
     ] [
       a foaf:Person;
       foaf:homepage <http://greggkellogg.net/>;
       foaf:name "Gregg Kellogg"@en;
       foaf:workplaceHomepage <http://kellogg-assoc.com/>
     ] [
       a foaf:Person;
       foaf:homepage <http://www.markus-lanthaler.com/>;
       foaf:name "Markus Lanthaler"@en;
       foaf:workplaceHomepage <http://www.tugraz.at/>
     ]);
   <bibo:subtitle> "A JSON-based Serialization for Linked Data"@en;
   dc:abstract """Abstract
  JSON is a useful data serialization and messaging format.
    This specification defines JSON-LD, a JSON-based format to serialize
    Linked Data. The syntax is designed to easily integrate into deployed
    systems that already use JSON, and provides a smooth upgrade path from
    JSON to JSON-LD.
    It is primarily intended to be a way to use Linked Data in Web-based
    programming environments, to build interoperable Web services, and to
    store Linked Data in JSON-based storage engines.
"""@en;
   dc:contributor [
     a foaf:Person;
     foaf:homepage <http://www.markus-lanthaler.com/>;
     foaf:name "Markus Lanthaler"@en;
     foaf:workplaceHomepage <http://www.tugraz.at/>
   ],  [
     a foaf:Person;
     foaf:homepage <http://neverspace.net/>;
     foaf:name "Niklas Lindström"@en
   ],  [
     a foaf:Person;
     foaf:homepage <http://digitalbazaar.com/>;
     foaf:name "Manu Sporny"@en;
     foaf:workplaceHomepage <http://digitalbazaar.com/>
   ],  [
     a foaf:Person;
     foaf:homepage <http://digitalbazaar.com/>;
     foaf:name "Dave Longley"@en;
     foaf:workplaceHomepage <http://digitalbazaar.com/>
   ],  [
     a foaf:Person;
     foaf:homepage <http://greggkellogg.net/>;
     foaf:name "Gregg Kellogg"@en;
     foaf:workplaceHomepage <http://kellogg-assoc.com/>
   ];
   dc:issued "2014-01-15T23:00:00.000Z"^^xsd:dateTime;
   dc:language "en"@en;
   dc:references <http://www.w3.org/DesignIssues/LinkedData.html>,
     <http://www.w3.org/TR/json-ld-api/>,
     <http://www.w3.org/TR/rdfa-core/>,
     <http://www.w3.org/TR/rdf11-mt/>,
     <http://www.w3.org/TR/2013/NOTE-microdata-20131029/>,
     <http://microformats.org>,
     <http://www.ietf.org/rfc/rfc2616.txt>,
     <http://www.w3.org/TR/2014/PER-rdf-schema-20140109/>,
     <http://www.w3.org/TR/rdf-schema/>,
     <http://www.w3.org/TR/2014/PR-rdf11-mt-20140109/>,
     <http://www.w3.org/TR/turtle/>,
     <http://www.ietf.org/rfc/rfc6839.txt>,
     <http://www.ietf.org/rfc/rfc6906.txt>,
     <http://www.ietf.org/rfc/rfc3986.txt>,
     <http://www.w3.org/TR/2014/PR-turtle-20140109/>;
   dc:replaces <http://www.w3.org/TR/2013/PR-json-ld-20131105/>;
   dc:requires <http://www.ietf.org/rfc/rfc4627.txt>,
     <http://www.w3.org/TR/rdf11-concepts/>,
     <http://www.ietf.org/rfc/rfc3987.txt>,
     <http://www.w3.org/TR/2014/PR-rdf11-concepts-20140109/>,
     <http://www.ietf.org/rfc/rfc5988.txt>,
     <http://www.ietf.org/rfc/rfc2119.txt>,
     <http://tools.ietf.org/html/bcp47>;
   <w3p:patentRules> <http://www.w3.org/Consortium/Patent-Policy-20040205/> .

<#conversion-of-native-data-types> xhv:role xhv:heading .

<#embedding-1> xhv:role xhv:heading .

<#h2_abstract> xhv:role xhv:heading .

<#h2_acknowledgements> xhv:role xhv:heading .

<#h2_advanced-concepts> xhv:role xhv:heading .

<#h2_basic-concepts> xhv:role xhv:heading .

<#h2_conformance> xhv:role xhv:heading .

<#h2_data-model> xhv:role xhv:heading .

<#h2_design-goals-and-rationale> xhv:role xhv:heading .

<#h2_iana-considerations> xhv:role xhv:heading .

<#h2_introduction> xhv:role xhv:heading .

<#h2_json-ld-grammar> xhv:role xhv:heading .

<#h2_references> xhv:role xhv:heading .

<#h2_relationship-to-other-linked-data-formats> xhv:role xhv:heading .

<#h2_relationship-to-rdf> xhv:role xhv:heading .

<#h2_sotd> xhv:role xhv:heading .

<#h2_terminology> xhv:role xhv:heading .

<#h2_toc> xhv:role xhv:heading .

<#h3_advanced-context-usage> xhv:role xhv:heading .

<#h3_aliasing-keywords> xhv:role xhv:heading .

<#h3_base-iri> xhv:role xhv:heading .

<#h3_compact-iris> xhv:role xhv:heading .

<#h3_compacted-document-form> xhv:role xhv:heading .

<#h3_context-definitions> xhv:role xhv:heading .

<#h3_data-indexing> xhv:role xhv:heading .

<#h3_data-model-overview> xhv:role xhv:heading .

<#h3_default-vocabulary> xhv:role xhv:heading .

<#h3_embedding> xhv:role xhv:heading .

<#h3_embedding-json-ld-in-html-documents> xhv:role xhv:heading .

<#h3_expanded-document-form> xhv:role xhv:heading .

<#h3_flattened-document-form> xhv:role xhv:heading .

<#h3_general-terminology> xhv:role xhv:heading .

<#h3_how-to-read-this-document> xhv:role xhv:heading .

<#h3_identifying-blank-nodes> xhv:role xhv:heading .

<#h3_index-maps> xhv:role xhv:heading .

<#h3_informative-references> xhv:role xhv:heading .

<#h3_interpreting-json-as-json-ld> xhv:role xhv:heading .

<#h3_iri-expansion-within-a-context> xhv:role xhv:heading .

<#h3_iris> xhv:role xhv:heading .

<#h3_language-maps> xhv:role xhv:heading .

<#h3_lists-and-sets> xhv:role xhv:heading .

<#h3_microdata> xhv:role xhv:heading .

<#h3_microformats> xhv:role xhv:heading .

<#h3_named-graphs> xhv:role xhv:heading .

<#h3_node-identifiers> xhv:role xhv:heading .

<#h3_node-objects> xhv:role xhv:heading .

<#h3_normative-references> xhv:role xhv:heading .

<#h3_rdfa> xhv:role xhv:heading .

<#h3_reverse-properties> xhv:role xhv:heading .

<#h3_serializing-deserializing-rdf> xhv:role xhv:heading .

<#h3_sets-and-lists> xhv:role xhv:heading .

<#h3_specifying-the-type> xhv:role xhv:heading .

<#h3_string-internationalization> xhv:role xhv:heading .

<#h3_syntax-tokens-and-keywords> xhv:role xhv:heading .

<#h3_terms> xhv:role xhv:heading .

<#h3_the-context> xhv:role xhv:heading .

<#h3_turtle> xhv:role xhv:heading .

<#h3_type-coercion> xhv:role xhv:heading .

<#h3_typed-values> xhv:role xhv:heading .

<#h3_value-objects> xhv:role xhv:heading .

<#h_note_1> xhv:role xhv:heading .

<#h_note_10> xhv:role xhv:heading .

<#h_note_2> xhv:role xhv:heading .

<#h_note_3> xhv:role xhv:heading .

<#h_note_4> xhv:role xhv:heading .

<#h_note_5> xhv:role xhv:heading .

<#h_note_6> xhv:role xhv:heading .

<#h_note_7> xhv:role xhv:heading .

<#h_note_8> xhv:role xhv:heading .

<#h_note_9> xhv:role xhv:heading .

<#lists> xhv:role xhv:heading .

<#prefix-definitions> xhv:role xhv:heading .

<#respecContents> xhv:role xhv:directory .

<#respecDocument> xhv:role xhv:document .

<#respecHeader> xhv:role xhv:contentinfo .

<#abstract> a <bibo:Chapter> .

<#informative-references> a <bibo:Chapter> .

<#normative-references> a <bibo:Chapter> .

<#references> a <bibo:Chapter>;
   <bibo:chapter> <#informative-references>,
     <#normative-references> .

<#sotd> a <bibo:Chapter> .

marcoscaceres commented 6 years ago

RDFa in ReSpec is a sunk cost and allows some aspect of our technical communications to be machine understandable, we should be striving for more of this, not less.

Respectfully, I'm asking two simple question:

who is using the meta data? (e.g., "it's shown in Google search results! Look here: screenshot!")
how does that directly benefit ReSpec users?

If the answer to the above is no one, and there is no direct benefit to our community, then we should remove the RDFa stuff.

halindrome commented 6 years ago

I honestly find this sort of conversation exhausting. There are two important arguments here:

We at the W3C need to "eat our own dogfood". That means instrumenting our specifications with metadata.
We know that search engines index metadata. See also http://schema.org

However, in deference to your more-or-less reasonable guideline that unused code is a burden, sure - let's get some real data. I will loop in the people from schema.org who deal with search engines and see what they have to say.

On Tue, Feb 13, 2018 at 7:05 PM, Marcos Cáceres notifications@github.com wrote:

RDFa in ReSpec is a sunk cost and allows some aspect of our technical communications to be machine understandable, we should be striving for more of this, not less.

Respectfully, I'm asking two simple question:

who is using the meta data? (e.g., "it's shown in Google search results! Look here: screenshot!")

how does that directly benefit ReSpec users?

If the answer to the above is no one, and there is no direct benefit to our community, then we should remove the RDFa stuff.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/respec/issues/1503#issuecomment-365460621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfx8Fa3enrQ2GXS_OYmI6k2VEaszRW0ks5tUjE9gaJpZM4SDxtn .

-- Shane McCarron halindrome@gmail.com

marcoscaceres commented 6 years ago

However, in deference to your more-or-less reasonable guideline that unused code is a burden, sure - let's get some real data.

That would be amazing. Again, the question is around W3C specs - I've seen good examples of funky things done with schema.org metadata and Google search results - so want the same wins for ReSpec/W3C Specs.

Hi @danbri! Suggestions?

danbri commented 6 years ago

Google (Search, Shopping, News, etc.) has decent RDFa 1.1 capabilities ( I believe everything except the relatively exotic property copying mechanism). All the "rich snippets" and similar structured data functionality should work just fine. Whether the specific vocabulary and data patterns you're publishing map to a current public facing question is another matter. W3C specs are pretty niche if you are describing a lot of standardization-related detail. We do have some basic stuff around describing software apps which I'm looking at lately, dataset discovery, and general patterns like "breadcrumbs". But anyway we like and use RDFa, even if JSON-lD is currently somewhat more popular with many.

marcoscaceres commented 6 years ago

So, we won't remove RDFa things in ReSpec that create "rich snippets". And we should actively add things that show up as "rich snippets".

However, if we are ever refactoring code, and we find RDFa stuff that doesn't show up as a "rich snippet", we should remove it (as it adds no value). But we won't actively remove RDFa support.

Sound like a plan?

marcoscaceres commented 6 years ago

Ok, basically, it looks like the RDFa stuff might be broken because Google can't actually determine the type of any of the data (testing with WCAG 2.1 Guidelines spec):

https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fw3c.github.io%2Fwcag21%2Fguidelines%2F

It does pick up the hcards, but gives them no contextual meaning.

I wonder also if having a JSON-LD description at the top of the document might not be better? Having RDFa peppered throughout the document doesn't seem great.

marcoscaceres commented 6 years ago

Tried this again with the json-ld spec too, and similar story... it finds foaf, and hcard, but nothing else... and with no context (i.e., it doesn't seems to express/understand "these people authored this work" or anything "semantic").

@danbri, sorry to bother you again, any suggestions for improving this? A significant chunk of ReSpec code is dedicated to sprinkling RDFa throughout specifications, but it all seems to be going to waste 😢

marcoscaceres commented 6 years ago

Fixed typo above.

gkellogg commented 6 years ago

The RDFa generated is not “broken”, it just doesn’t generate RDF using schema.org properties and classes. Try it with the Structured Data Linter and you will see quite a different result. Outputting the triples as JSON-LD wouldn’t produce any different result. IMO, when talking about a document, using document markup (RDFa) makes sense. When talking about what the document describes, JSON-LD makes sense.

I am a bit concerned if the sole measure of success is if Google will generate rich snippets when scanning the document, but we live in a different world now than we did when drawing on other well established vocabularies was considered good form.

Satisfying a new goal of being schema.org friendly means either adding or replacing classes and properties with reasonably similar scheme.org terms.

IMHO, the most important things to capture are title, editors/authors, references and maybe summary information. The chapter headings, and their relationship with each other may also be useful. I could also see generating something for each definition.

halindrome commented 6 years ago

We could switch to the schema.org vocabulary pretty quickly actually. I had forgotten that google's processor only really cares about that vocab. myopic, but whatever.

On Thu, Feb 15, 2018 at 10:05 AM, Gregg Kellogg notifications@github.com wrote:

The RDFa generated is not “broken”, it just doesn’t generate RDF using schema.org properties and classes. Try it with the Structured Data Linter http://linter.structured-data.org and you will see quite a different result. Outputting the triples as JSON-LD wouldn’t produce any different result. IMO, when talking about a document, using document markup (RDFa) makes sense. When talking about what the document describes, JSON-LD makes sense.

I am a bit concerned if the sole measure of success is if Google will generate rich snippets when scanning the document, but we live in a different world now than we did when drawing on other well established vocabularies was considered good form.

Satisfying a new goal of being schema.org friendly means either adding or replacing classes and properties with reasonably similar scheme.org terms.

IMHO, the most important things to capture are title, editors/authors, references and maybe summary information. The chapter headings, and their relationship with each other may also be useful. I could also see generating something for each definition.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/respec/issues/1503#issuecomment-365974235, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfx8NjdJHjulVm9RZGrDTQDg4S2bLG_ks5tVFXLgaJpZM4SDxtn .

-- Shane McCarron halindrome@gmail.com

gkellogg commented 6 years ago

I'll work on mapping classes and properties, show an experimental result and do a PR.

marcoscaceres commented 6 years ago

@gkellogg, apologies... I should not have used the word "broken" (term of art?)... I meant, that it's not benefiting the standards community as much as I think it should, given the investment of time we've (well, mostly you) have put into peppering if(conf.doRDFa){} throughout ReSpec.

I'm excited to see what we can take from schema.org - specially if it does something useful in Google Search results.

cc @msporny - as he get's excited about this stuff too and might have suggestions :)

Eventually, if we see big wins, we should consider just putting a JSON-LD block at the top of documents, as if(conf.doRDFa){} everywhere is not maintainable.

saschanaz commented 6 years ago

as if(conf.doRDFa){} everywhere is not maintainable.

A newbie question here: why do we add it conditionally? Performance?

marcoscaceres commented 6 years ago

Performance?

No, a lot of people in the community don't want RDFa in their specs because they felt it bloated specs with no value to readers or editors (why I keep asking for proof of value and utility to our community).

This is why if we can get the RDFa stuff to show up in Google search results we can go back to the community and be like, "Check this awesomeness out! You know you wanted... see, this semantic web stuff IS useful." And if it turns out to be like super useful, with clear benefits, then we can get rid of the if statements and just include it by default (right now it's disabled by default).

It was only in the last few years that schema.org took off and Google started supporting RDFa, potentially making this metadata actually useful for something (rich snippets are ACTUALLY useful to average users... the rest, not so much).

It's still in question if it anything useful will come of this, but I'm optimistic for the first time in, oh... 12 years?

msporny commented 6 years ago

cc @msporny - as he get's excited about this stuff too and might have suggestions :)

I know this'll be heresy coming from the ex-Chair of the RDFa 1.1 WG and very active participant in the Microformats community (especially because I dumped years of my life into the efforts), but ultimately I think the RDFa, Microdata, and Microformats experiments have largely failed. The markup is too brittle to survive web designers that aren't deeply steeped in how to to write this stuff, and most don't use tools to help them make sure they're doing the right thing, and the community isn't producing the tools that web developers need. The RDFa markup that continues to not break is stuff that's auto-generated (as predicted many years ago). Maybe this will change in the next decade, but the trend line doesn't look good for adoption (even though it's increasing, it's not at an acceptably exponential rate).

I'm still fairly bullish on JSON-LD in HTML documents because most web designers try to avoid mashing JSON data structures in page markup, easier for developers to read, and easier to have a tool auto-generate it for you and then cut/paste into your document.

To be clear, I'd still like document semantics expressed in ReSpec to make it easier for tools to scrape data from specs. For example, I've been wanting to figure out how to do analysis on who is producing specs and at what speed at W3C for years and until there is useful metadata for me to scrape, this is difficult to do. So +1 for easy to get at metadata and I think JSON-LD is striking the best balance for that purpose in HTML documents that we've been able to come up with so far.

I wonder also if having a JSON-LD description at the top of the document might not be better?

I'd be supportive of this direction (but completely unable to do the work), so my input should be assigned very little weight.

marcoscaceres commented 6 years ago

Really appreciate your input @msporny! Thank you.

danbri commented 6 years ago

(Careering somewhat offtopic, but while we're here with interested parties.)

@msporny et al., what would you think of something for a JSON-LD future version, for the case where the JSON is embedded in HTML, that allowed property values (e.g. a lengthy abstract/description, or other textual content, or large blocks of markup) to be slurped out of the human-facing HTML content (via CSS selector, ID or xpath) rather than having it repeated all over again within the JSON-LD island? There would still be fragility/maintenance issues but it would take some pressure off the JSON, which is not really a good place to keep large chunks of markup...

msporny commented 6 years ago

the case where the JSON is embedded in HTML, that allowed property values (e.g. a lengthy abstract/description, or other textual content, or large blocks of markup) to be slurped out of the human-facing HTML content (via CSS selector, ID or xpath) rather than having it repeated all over again within the JSON-LD island?

Very interesting. So, we're spinning up a JSON-LD 1.1 WG soon-ish, and this should be a topic of discussion there.

What I suggest is that you allocate a schema.org property that says "slurp in the data from HERE" property, and then specify something that a JSON-LD processor can optionally call via the JSON-LD API. I'm trying to make sure that we don't put a heavy implementation burden on JSON-LD processors. Honestly, this could just be a design pattern that is supported in higher-level libraries... perhaps not even in the JSON-LD API (I'd prefer it to not be there), but a JSON-LD-in-HTML library that knows how to transform JSON-LD with CSS selectors (or whatever) into post-processed objects containing the relevant HTML. We'd most likely implement this in jsonld.js and the other JSON-LD processors we maintain, but I'm not sure we should foist that burden on other JSON-LD processor developers.

This approach allows the JSON-LD at the top of the doc stays skinny, but you get some of the benefits of DRY (which we've been failing at getting into wide-spread adoption for well over a decade now). Yes, the fragility still exists, but it's less fragile than Microformats, Microdata, and RDFa.

gkellogg commented 6 years ago

what would you think of something for a JSON-LD future version, for the case where the JSON is embedded in HTML, that allowed property values (e.g. a lengthy abstract/description, or other textual content, or large blocks of markup) to be slurped out of the human-facing HTML content (via CSS selector, ID or xpath) rather than having it repeated all over again within the JSON-LD island?

Note that this should just work if the page contains RDFa using subjects that are also in the JSON-LD, so that the triples from the RDFa (or Microdata) would "supplement" that described in JSON-LD. However, my testing output using SDTT with RDFa has not been very successful. Still, IMO, that's they way it should work.

The only large bit of content to be slurped out is the abstract, and that might not be worth it. I'll summarize the results of my testing in another comment.

gkellogg commented 6 years ago

I modified a local version of ReSpec to generate schema.org classes and properties rather than what's done now. I ran this on a copied version of the JSON-LD 1.1 spec, modified to be a "WD", as "CG-DRAFT" does not get the RDFa treatment (even if doRDFa is set explicitly). When saved, this is generated.

The Google Structured Data Testing Tool does a really poor job of getting the RDFa out, which I found surprising, even though it uses the same form as schema.org examples (@danbri, perhaps you have something to say about this?). However, when transformed into the JSON-LD equivalent, it does a pretty good job, although the SDTT doesn't really do a good job of handling generic JSON-LD, and note the context that must be applied to get values represented as strings.

Both of these could be thinner if we left out the schema:Chapter entries linked together with schema:hasPart, although I think it's cool to provide some deep content.

As the RDFa generated is perfectly reasonable, and other parts of the Google infrastructure may well parse it properly, the first step might be for me to do a PR with the results of my hacking to move to using schema.org properties and classes.

Of necessity, generating the JSON-LD directly would require a different approach, but shouldn't prove to be that difficult.

marcoscaceres commented 6 years ago

Hold up on the PR @gkellogg. We have a big PR landing that changes all the templates.

saschanaz commented 6 years ago

1514 is now merged!

halindrome commented 6 years ago

I love this so much. Thanks for doing the important research, Gregg! Interesting that the Google tool is not doing as well as you expect. I look forward to learning more.

On Feb 16, 2018 8:09 PM, "Gregg Kellogg" notifications@github.com wrote:

I modified a local version of ReSpec to generate schema.org classes and properties rather than what's done now. I ran this on a copied version of the JSON-LD 1.1 spec https://json-ld.org/spec/latest/json-ld/, modified to be a "WD", as "CG-DRAFT" does not get the RDFa treatment (even if doRDFa is set explicitly). When saved, this is generated https://gist.github.com/gkellogg/3eed670b1df76f4750a7703bd0be87e1#file-json-ld-html .

The Google Structured Data Testing Tool https://search.google.com/structured-data/testing-tool/ does a really poor job of getting the RDFa out, which I found surprising, even though it uses the same form as schema.org examples (@danbri https://github.com/danbri, perhaps you have something to say about this?). However, when transformed into the JSON-LD equivalent https://gist.github.com/gkellogg/3eed670b1df76f4750a7703bd0be87e1#file-json-ld-jsonld, it does a pretty good job, although the SDTT doesn't really do a good job of handling generic JSON-LD, and note the context that must be applied to get values represented as strings.

Both of these could be thinner if we left out the schema:Chapter entries linked together with schema:hasPart, although I think it's cool to provide some deep content.

As the RDFa generated is perfectly reasonable, and other parts of the Google infrastructure may well parse it properly, the first step might be for me to do a PR with the results of my hacking to move to using schema.org properties and classes.

Of necessity, generating the JSON-LD directly would require a different approach, but shouldn't prove to be that difficult.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/respec/issues/1503#issuecomment-366407739, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfx8JKIsw2KLm6CzbsjlS-hYiN1d7i1ks5tVjT2gaJpZM4SDxtn .

danbri commented 6 years ago

Interesting, Gregg. Yes - this is a known issue, we do some per-parser postprocessing that currently breaks subjecti-URI merging across syntaxes; I will take a look. Can this idea be extended to encompass css selectors and xpaths?

gkellogg commented 6 years ago

I think @msporny’s approach probably works best, if the RDFa merging approach is somehow not acceptable. Basically an inference rule that would allow a processor to treat some part of the graph resulting from the embedded JSON-LD as a pointer into the document. The Role class does something like this already, so maybe a subclass of Role that has properties to contain a fragment, css, or XPath into the contained document that becomes the value of a property of the Role.

Role already provides a reification-like indirection capability, and if implemented at Google the way the Linter treats it, should have the desired effect.

<html>
<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "abstract": {
    "@type": "SelectorRole",
    "cssSelector": "section#abstract"
  }
}
</script>
<section id="abstract">
<p>Something about my doc.</p>
</section>
</html>

This might result in the following, after processing:

{
  "@context": "http://schema.org",
  "abstract": {
    "@type": "SelectorRole",
    "cssSelector": "section#abstract",
    "abstract": {
      "@value": "<p>Something about my doc.</p>",
      "@type": "rdf:HTML"
    }
  }
}

Using XPath, you might use a text() modifier to get text vs markup content and avoid the rdf:HTML datatype, or another property could control this.

gkellogg commented 6 years ago

Ultimately having both JSON-LD and RDFa support are not in conflict, but based on the perceived value of the JSON-LD SEO (#1517), the RDFa support is redundant and could be deprecated/removed. This implementation is much more modular and does not send tentacles out into other bits of code, so should be less sensitive to semantic drift.

marcoscaceres commented 6 years ago

For the reasons your mentioned, it would be great if we deprecated the RDFa in favor of the new JSON-LD module.

halindrome commented 6 years ago

Honestly, my only 2 comments on all of this are:

1) Gregg - you are a rock star and I can't thank you enough! 2) I am of course cool with this if it happens seamlessly for all users. In the interests of making everything better, I would also be cool with making the new JSON-LD insertion the default so everyone has active semantics in their specs!

gkellogg commented 6 years ago

I think we can close this now, somewhat sadly. Above I made the claim that RDFa is best for describing a document, and JSON-LD for describing what the document models, but looking at the implementation, this is much more modular and maintainable, as almost all of the data comes out of the configuration, as managed by other modules along the way. The RDFa really looked like spaghetti code code in comparison, and was much more fragile.

saschanaz commented 6 years ago

ReSpec now has JSON-LD support instead of RDFa thanks to @gkellogg. Much appreciated! 👍💕🎉

speced / respec

Should RDFa still be supported? #1503

1514 is now merged!