linkeddata / rdflib.js

Linked Data API for JavaScript
http://linkeddata.github.io/rdflib.js/doc/
Other
566 stars 146 forks source link

Turtle serializer shouldn't write blank nodes as <...> #555

Closed fennibay closed 1 year ago

fennibay commented 2 years ago

I'm converting JSON-LD to Turtle using rdflib.js.

Example input:

{
    "@context": {
        "ex": "http://example.com#",
    },
    "@id": "ex:myid",
    "ex:prop1": {
        "ex:prop2": {
            "ex:prop3": "value",
        },
    },
}

Example current output out of rdflib.js:

@prefix ex: <http://example.com#>.

<_:b0> ex:prop2 <_:b1>.
<_:b1> ex:prop3 "value".
ex:myid ex:prop1 <_:b0>.

Turtle spec states following:

RDF blank nodes in Turtle are expressed as _: followed by a blank node label which is a series of name characters.

So, I think blank nodes should be expressed without <...>, because this makes them absolute or relative IRIs and not blank nodes.

As an additional feature, it would be nice to be able to control the blank node output to have them nested or not nested.

Questions:

  1. Is this a known issue? I saw some non-conformances in #329, but couldn't find this exact case there.
  2. Could this be affected by arguments? In case I'm calling the functions wrong? I'm including below my code snippet.
/**
 * Convert JSON-LD to Turtle
 * @param input JSON string
 * @param base Base IRI for the content
 * @param namespaces The namespace map for use in ttl
 * @returns TTL string
 */
async function convertJsonLdToTtl(
    input: string,
    base: string,
    namespaces: Record<string, string> = {},
): Promise<string> {
    return new Promise<string>((res, rej) => {
        const store = rdflib.graph()
        rdflib.parse(input, store, base, "application/ld+json", (err, kb) => {
            if (err) {
                rej(err)
            } else {
                if (!kb) {
                    rej("KB empty: " + kb)
                } else {
                    console.log("KB # statements: " + kb.statements.length)
                    rdflib.serialize(
                        null,
                        kb,
                        undefined,
                        "text/turtle",
                        (err, output) => {
                            if (err) {
                                rej(err)
                            } else {
                                if (!output) {
                                    rej("Empty output: " + output)
                                } else {
                                    res(output)
                                }
                            }
                        },
                        {
                            namespaces,
                        },
                    )
                }
            }
        })
    })
}

Many thanks.

jeff-zucker commented 2 years ago

I can confirm that <_:b0> is a NamedNode, not a BlankNode in Turtle. So this looks like a bug.

bourgeoa commented 2 years ago

Agreed. The issue may be in JSON-LD parser and not in turtle serializer.

fennibay commented 2 years ago

Agreed. The issue may be in JSON-LD parser and not in turtle serializer.

Thx for the hint. So I tried to first convert from JSON-LD to N-Quads (with another library, jsonld) and then convert to Turtle. Which helped by embedding the blank nodes. So the blank node labels may still be wrong, I couldn't test this, but my problem is solved for now.

RinkeHoekstra commented 2 years ago

This is rather problematic for any system that uses rdflib.js to parse JSON-LD. Any chance this can get prioritized?

RinkeHoekstra commented 2 years ago

I can confirm that e.g. the following JSON-LD is not parsed correctly:

{
    "@context": {
        "@vocab": "https://example.com/"
    },
    "hasExampleProperty": "some literal value"
}

Results in the following statement (I'm using an example IRI for the graph here):

{
    "subject": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "_:b0"
    },
    "predicate": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/hasExampleProperty"
    },
    "object": {
        "termType": "Literal",
        "classOrder": 1,
        "value": "some literal value",
        "datatype": {
            "termType": "NamedNode",
            "classOrder": 5,
            "value": "http://www.w3.org/2001/XMLSchema#string"
        },
        "isVar": 0,
        "language": ""
    },
    "graph": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/test/"
    }
}

But clearly _:b0 should be a BlankNode.

Whereas the corresponding Turtle, is parsed correctly:

@prefix ex: <https://example.com/> .

[] ex:hasExampleProperty "some literal value" .

Becomes:

{
    "subject": {
        "termType": "BlankNode",
        "classOrder": 6,
        "value": "_g_L2C39",
        "isBlank": 1,
        "isVar": 1
    },
    "predicate": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/hasExampleProperty"
    },
    "object": {
        "termType": "Literal",
        "classOrder": 1,
        "value": "some literal value",
        "datatype": {
            "termType": "NamedNode",
            "classOrder": 5,
            "value": "http://www.w3.org/2001/XMLSchema#string"
        },
        "isVar": 0,
        "language": ""
    },
    "graph": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/test/"
    }
}

(Interestingly, the blank node gets a completely different internal identifier in this case).

RinkeHoekstra commented 2 years ago

When the JSON-LD contains a list, the blank nodes corresponding to that collection are generated correctly:

{
    "@context": {
        "@vocab": "https://example.com/",
        "hasExampleProperty": {
            "@container": "@list"
        }
    },
    "hasExampleProperty": ["some literal value", "some other literal value"]
}

As N-Quads:

_:n4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "some other literal value".
_:n4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nill>.
_:n5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "some literal value".
_:n5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:n4.
<_:b0> <https://example.com/hasExampleProperty> _:n5 <https://example.com/test/> .
RinkeHoekstra commented 2 years ago

The function jsonldObjectToTerm does not appear to ever return a BlankNode

https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/jsonldparser.js#L11

RinkeHoekstra commented 2 years ago

Diagnosis

It looks like the flatten function from jsonld.js is the culprit.

The JSON-LD parser takes the flattened output, and checks for @id attributes to determine whether the JSON object represents a blank node or not.

https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/jsonldparser.js#L68-L83

and:

https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/jsonldparser.js#L24-L26

However, the jsonld.js flattened output inserts @id attributes, e.g. the above JSON-LD (without the list) results in:

[
  {
    "@id": "_:b0",
    "https://example.com/hasExampleProperty": [
      {
        "@value": "some literal value"
      }
    ]
  }
]

This turns the node into a NamedNode because it has an @id attribute.

The @id attribute is a non-normative part of the JSON-LD specification at https://www.w3.org/TR/json-ld11/#identifying-blank-nodes.

The flattened output (also non-normative) uses this in its examples: https://www.w3.org/TR/json-ld11/#flattened-document-form (and it needs to as it cannot use nesting to group the properties of the node together).

Proposed Solution