rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
714 stars 130 forks source link

Question: Parsing RDF Blank Node shorthand #353

Closed mikeyhogarth closed 1 year ago

mikeyhogarth commented 1 year ago

Really sorry if this is a silly question, I mostly am just asking to clarify I am right in my assumptions and I'm relatively new to the world of linked data.

Is it true to say that a document such as this one from the w3c spec:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

[ foaf:name "Alice" ] foaf:knows [
    foaf:name "Bob" ;
    foaf:knows [
        foaf:name "Eve" ] ;
    foaf:mbox <bob@example.com> ] .

Basically can't be parsed by any of the streaming-type parsers such as N3 without the blank nodes all becoming separated from each other / getting different labels?

jeswr commented 1 year ago

You should be perfectly fine parsing this with N3.js and maintaining blank node references. In the parser they will be assigned blank node identifiers to reference them; and you will receive a set of statements that looks like the following (as RDF/JS quads)

_:b1 foaf:name "Alice" .
_:b1 foaf:knows _:b2 .
_:b2 foaf:name "Bob" .
_:b2 foaf:knows _:b3 .
...

Closing the issue - but feel free to re-open if you have follow up questions.

jacoscaz commented 1 year ago

Basically can't be parsed by any of the streaming-type parsers such as N3 without the blank nodes all becoming separated from each other / getting different labels?

If we're talking about a single parsing run, this is not correct as per @jeswr 's comment. However, based on my own experience as a newcomer to linked data, I fear you might have stumbled into the fact that multiple parsing runs of the same document may (and often will) result in different sets of quads with different blank nodes, although each set will still correctly reflect the relationships described in the original document.

If this is not the case, then I apologize for the noise. If this is the case, however, let me know and I'd gladly clarify why that is happening to try and spare you the inevitable headaches.

mikeyhogarth commented 1 year ago

@jacoscaz I wonder if this is the case - I'm doing the parsing run from a frontend framework that may be running the parse multiple times due to reactivity hooks. Thank you for the lead, I will look into this!

@jeswr - that's not what I'm seeing, I'm basically seeing all blank nodes coming back with different numbers regardless, although I don't really want to bother anyone with any more questions until I've looked into @jacoscaz' comment as I think they may be onto something :)

jeswr commented 1 year ago

Just did a sanity check on my end. Indeed bnodes are correctly preserved with your file when parsing in N3 mode.

In particular running the following

const { Parser } = require('n3');

const parser = new Parser({ format: 'text/n3' });

console.log(parser.parse(`
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

[ foaf:name "Alice" ] foaf:knows [
    foaf:name "Bob" ;
    foaf:knows [
        foaf:name "Eve" ] ;
    foaf:mbox <bob@example.com> ] .
`));

Gives

[
  Quad {
    id: '',
    _subject: BlankNode { id: '_:n3-0' },
    _predicate: NamedNode { id: 'http://xmlns.com/foaf/0.1/name' },
    _object: Literal { id: '"Alice"' },
    _graph: DefaultGraph { id: '' }
  },
  Quad {
    id: '',
    _subject: BlankNode { id: '_:n3-1' },
    _predicate: NamedNode { id: 'http://xmlns.com/foaf/0.1/name' },
    _object: Literal { id: '"Bob"' },
    _graph: DefaultGraph { id: '' }
  },
  Quad {
    id: '',
    _subject: BlankNode { id: '_:n3-2' },
    _predicate: NamedNode { id: 'http://xmlns.com/foaf/0.1/name' },
    _object: Literal { id: '"Eve"' },
    _graph: DefaultGraph { id: '' }
  },
  Quad {
    id: '',
    _subject: BlankNode { id: '_:n3-1' },
    _predicate: NamedNode { id: 'http://xmlns.com/foaf/0.1/knows' },
    _object: BlankNode { id: '_:n3-2' },
    _graph: DefaultGraph { id: '' }
  },
  Quad {
    id: '',
    _subject: BlankNode { id: '_:n3-1' },
    _predicate: NamedNode { id: 'http://xmlns.com/foaf/0.1/mbox' },
    _object: NamedNode { id: 'bob@example.com' },
    _graph: DefaultGraph { id: '' }
  },
  Quad {
    id: '',
    _subject: BlankNode { id: '_:n3-0' },
    _predicate: NamedNode { id: 'http://xmlns.com/foaf/0.1/knows' },
    _object: BlankNode { id: '_:n3-1' },
    _graph: DefaultGraph { id: '' }
  }

I suspect @jacosaz is correct in stating that the cause of your error is from parsing the file multiple times.

mikeyhogarth commented 1 year ago

Just to close this one off - I've done some checks myself also and you are right, n3 is parsing these just fine. My issue was that after the parse, I use the resultant N3 Store as a Comunica data source and Comunica is the thing that's making the blank nodes go out of whack (they're fine in the store, it's just after running a comunica query that they get corrupted).

The bug does appear to have been fixed a few weeks ago but they've not put out a release yet, so I guess my issue will just go away once that's out!