Open dantman opened 5 years ago
Could be interesting to add it to the rdfa parser? Or a separate parser? Presumable one could also parse inline turtle as well.
I'd bet that the RDFa parser does not have a dependency on a JSON-LD parser. But rdflib does and already converts JSON-LD to the format we need.
My expectation is that the optimal thing to do is in the text/html
condition, in addition to passing the html to the rdfa parser parse it to a basic dom with a parser we already have and scan it for <script>
tags and parse the contents of any type="application/ld+json"
scripts with the JSON-LD parser. This may involve double-parsing html. But if we want to avoid that, instead of making the rdfaparser parse non-rdfa we should just make it accept a pre-parsed dom instead of only html strings.
This of course could be expanded to inline turtle or inline versions of any other format rdflib supports.
I don't think the RDFa parser is particularly useful toward extracting and parsing JSON-LD, N3, Turtle, TriG, etc in HTML documents using the script
extension mechanism. So, I would agree that would require second parsing. Perhaps a flag can be used to turn it on/off.
+1, this would be a really helpful mechanism.
This library appears to support fetching JSON-LD over HTTP when the whole response is JSON-LD and an
application/ld+json
is used. However in the real-world a lot of JSON-LD used on the web comes as script tags in the html. I think it would be worthwhile to support this type of linked data.See https://developers.google.com/search/docs/guides/intro-structured-data for a code example.
For a real-world example look at the source of https://www.apple.com/, you'll find:
Side note trying to fetch
https://www.apple.com
dumps a bunch ofTypeError: callback is not a function
errors fromN3Parser.tripleCallback
.