w3c / microdata

Moved to https://html.spec.whatwg.org/multipage/microdata.html
15 stars 19 forks source link

add JSON-LD conversion #76

Closed chaals closed 7 years ago

chaals commented 7 years ago

fix #29

Note this just uses the changes @lanthaler said were needed back in 2012, so it should be checked before merging.

chaals commented 7 years ago

@msporny, @halindrome @gkellogg can you take a quick look?

gkellogg commented 7 years ago

cc/ @niklasl

gkellogg commented 7 years ago

I have an implementation alongside my normal, and RDFa-based Microdata parser. It pretty much follows the algorithm, with the comments I've previously made.

The property_value id dumbed-down from that used in the native RDF parser, so it could relatively easily obtain datatypes for values and parse numbers.

It allocates an @id using a generated BNode identifier when it encounters a reference to another item that doesn't already have an @id, and allows this to be used when an item is already found in memory.

It creates an item-level @context containing @vocab when it finds a local vocabulary for an item. It probably wouldn't be difficult to avoid this when the parent of the item has the same vocabulary.

It always uses @graph, but this could be optimized in case there is only a single top-level item.

It infers language and base-URL by introspection into the DOM.

Edit: it also trims whitespace around values, which at least makes the output look a bit better.

gkellogg commented 7 years ago

For consideration, here's my output to your example

{
  "@graph": [
    {
      "@context": {"@vocab": "https://schema.org/"},
      "@type": ["https://schema.org/BlogPosting"],
      "headline": ["Progress report"],
      "url": [{"@id": "http://example.com?comments=0"}],
      "comment": [
        {
          "@context": {"@vocab": "https://schema.org/"},
          "@type": ["https://schema.org/Comment"],
          "url": [{"@id": "http://example.com#c1"}
          ],
          "creator": [
            {
              "@context": {"@vocab": "https://schema.org/"},
              "@type": ["https://schema.org/Person"],
              "name": ["Greg"]
            }
          ],
          "dateCreated": ["2013-08-29"]
        }
      ],
      "datePublished": ["2013-08-29"]
    }
  ]
}
chaals commented 7 years ago

Quick reaction:My sincere thanks @gkellogg @msporny for the work and comments. The one that struck me even before asking was the vocab but it looks liike there is more to deal with. Given that microdata only has two datatypes (string and URL), I'm going to leave it that you can - and probably should in a grown-up format - infer that for example a datetime is actually a date, without requiring that in the microdata spec. I'm pushing to get this done ASAP, but it's not the only priority in the world, so I hope I will have sensible replies over the weekend.

danbri commented 7 years ago

On the context question, if the generated output were to always explicitly mark literals as ID or literal/text, would that minimize or remove the need for context declarations?

gkellogg commented 7 years ago

Yes, you can always generate expanded nodes/literal. However, if you know enough to use "@vocab": "http://scheme.org", you can also take advantage of using as a contex, and assume all of its term definitions.