linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs
https://linked.art/
Other
91 stars 14 forks source link

Unable to serialise "part", "part_of" with Python `rdflib-jsonld` #395

Closed edwardanderson closed 3 years ago

edwardanderson commented 3 years ago

I'm working on some Linked Art data processing scripts and I've come across a bug (uncertain where, yet: in my own implementation, in the rdflib-jsonld plugin, or in the linked-art.json context?) where certain properties don't serialise.

Here is an example:

'''
Parse and serialise a Linked Art JSON-LD document using the <https://github.com/RDFLib/rdflib-jsonld> plugin.
'''

from rdflib import Graph, plugin

json_ld_str = '''
{
  "@context": "https://linked.art/ns/v1/linked-art.json", 
  "id": "http://www.example.com/artwork/1",
  "type": "HumanMadeObject",
  "_label": "whole",
  "part": [
    {
      "id": "http://www.example.com/artwork/2",
      "type": "HumanMadeObject",
      "_label": "section"
    }
  ],
  "identified_by": [
    {
      "type": "Name",
      "content": "Title"
    }
  ]
}
'''

graph = Graph()
graph.parse(data=json_ld_str, format='json-ld')
print(graph.serialize(format='turtle').decode('utf-8'))

The part data is not present in the output (although Name serialises just fine):

@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://www.example.com/artwork/1> a crm:E22_Human-Made_Object ;
    rdfs:label "whole" ;
    crm:P1_is_identified_by [ a crm:E33_E41_Linguistic_Appellation ;
            crm:P190_has_symbolic_content "Title" ] .

Is anyone already successfully parsing and serialising Linked Art JSON-LD in Python and would share some tips?

azaroth42 commented 3 years ago

Use pyld instead of rdflib-jsonld. This is how the website generates the json-ld and turtle serializations:

from rdflib import ConjunctiveGraph, URIRef
from pyld.jsonld import expand, to_rdf, JsonLdProcessor, set_document_loader

js = { ... json here ... }
nq = to_rdf(js, {"format": "application/nquads"})
g = ConjunctiveGraph()
for ns in ['crm', 'dc', 'schema', 'dcterms', 'skos', 'la']:
    g.bind(ns, ctxt[ns])
g.parse(data=nq, format="nt")
out = g.serialize(format="turtle")

See: https://github.com/linked-art/linked.art/blob/master/extensions/text.py#L413

The way that part can magically be the right partitioning predicate for different classes is through scoped contexts in the JSON-LD, which it seems that rdflib-jsonld doesn't support. In the context we define:

"HumanMadeObject": {
      "@context": {
        "part_of": {
          "@id": "crm:P46i_forms_part_of", 
          "@type": "@id", 
          "@container": "@set"
        }, 
        "part": {
          "@id": "crm:P46_is_composed_of", 
          "@type": "@id", 
          "@container": "@set"
        }, 

See: https://github.com/linked-art/linked.art/blob/master/content/ns/v1/linked-art.json#L321

Which should only be processed if the current class is HumanMadeObject.

edwardanderson commented 3 years ago

Thanks so much @azaroth42! I've switched libraries, followed your pattern and have now have working Turtle serialisation :)