solid-contrib / specification-tests

Test coverage report
https://solid-contrib.github.io/specification-tests/
MIT License
7 stars 4 forks source link

Support annotations embedded in <script> tags #76

Open elf-pavlik opened 2 years ago

elf-pavlik commented 2 years ago

https://github.com/solid/specification-tests#annotations

The intention is that this will be done as RDFa annotations in the specification documents, but it is understood that this will take some time. As a workaround, the same data may be provided in Turtle format separate to the specification.

I understand that Turtle is already supported as separate documents. Turtle spec recommends embedding it in HTML using <script> tags: https://www.w3.org/TR/turtle/#in-html

A similar approach exists for JSON-LD: https://www.w3.org/TR/json-ld11/#embedding-json-ld-in-html-documents

Will it be straightforward to support including annotations in the spec this way? At least Turtle first and JSON-LD later on.

I created the first example in https://github.com/solid/notifications/pull/56 we should merge it soon and it could be used to test this approach with embedded Turtle.

edwardsph commented 2 years ago

I will need to introduce an HTML parser that can read pull out the contents of the script elements and then parse each fragment as Turtle. I will just need to deal with the fact that RDFa and script based versions of the spec are all .html files so I'll have to parse as RDFa and if I don't find any suitable content, try again as HTML. It is unfortunate that GitHub doesn't allow us to define content types for data is serves.

elf-pavlik commented 2 years ago

It is unfortunate that GitHub doesn't allow us to define content types for data is serves.

AFAIK there is no distinct content type defined for HTML embedding RDFa. I think one always needs to sniff to find if HTML includes RDFa and/or RDF embedded in <script> tags.

I'll have to parse as RDFa and if I don't find any suitable content, try again as HTML.

I don't think anyone is using that in spec but I think it's valid to mix RDFa with RDF in script tags. Maybe it would be possible to try to extract all the RDF (RDFa and embedded Turtle / JSON-LD) and combine all found statements.

edwardsph commented 2 years ago

I was thinking of application/xhtml+xml because it would mean I know which parser to use. Your suggestion would work but I just need to parse the documents twice as the RDFa parser doesn't give me access to the HTML, it just extracts the RDF from RDFa annotations. The HTML parser would give me the DOM to extract the embedded Turtle/JSON-LD. I've added a task for the test harness to implement this.