Closed chaals closed 7 years ago
@msporny, @halindrome @gkellogg can you take a quick look?
cc/ @niklasl
I have an implementation alongside my normal, and RDFa-based Microdata parser. It pretty much follows the algorithm, with the comments I've previously made.
The property_value
id dumbed-down from that used in the native RDF parser, so it could relatively easily obtain datatypes for values and parse numbers.
It allocates an @id
using a generated BNode identifier when it encounters a reference to another item that doesn't already have an @id
, and allows this to be used when an item is already found in memory.
It creates an item-level @context
containing @vocab
when it finds a local vocabulary for an item. It probably wouldn't be difficult to avoid this when the parent of the item has the same vocabulary.
It always uses @graph
, but this could be optimized in case there is only a single top-level item.
It infers language and base-URL by introspection into the DOM.
Edit: it also trims whitespace around values, which at least makes the output look a bit better.
For consideration, here's my output to your example
{
"@graph": [
{
"@context": {"@vocab": "https://schema.org/"},
"@type": ["https://schema.org/BlogPosting"],
"headline": ["Progress report"],
"url": [{"@id": "http://example.com?comments=0"}],
"comment": [
{
"@context": {"@vocab": "https://schema.org/"},
"@type": ["https://schema.org/Comment"],
"url": [{"@id": "http://example.com#c1"}
],
"creator": [
{
"@context": {"@vocab": "https://schema.org/"},
"@type": ["https://schema.org/Person"],
"name": ["Greg"]
}
],
"dateCreated": ["2013-08-29"]
}
],
"datePublished": ["2013-08-29"]
}
]
}
Quick reaction:My sincere thanks @gkellogg @msporny for the work and comments. The one that struck me even before asking was the vocab
but it looks liike there is more to deal with. Given that microdata only has two datatypes (string and URL), I'm going to leave it that you can - and probably should in a grown-up format - infer that for example a datetime
is actually a date, without requiring that in the microdata spec. I'm pushing to get this done ASAP, but it's not the only priority in the world, so I hope I will have sensible replies over the weekend.
On the context question, if the generated output were to always explicitly mark literals as ID or literal/text, would that minimize or remove the need for context declarations?
Yes, you can always generate expanded nodes/literal. However, if you know enough to use "@vocab": "http://scheme.org", you can also take advantage of using as a contex, and assume all of its term definitions.
fix #29
Note this just uses the changes @lanthaler said were needed back in 2012, so it should be checked before merging.