rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
707 stars 130 forks source link

N-Quads are written if N-Triples are requested and graph is supplied #165

Open thombohlk opened 5 years ago

thombohlk commented 5 years ago

When using a Writer with format N-Triples and adding quads that include a graph, the writer will output in N-Quads format.

Code (using N3.js version 1.0.4):

var N3 = require('../N3');
const writer = new N3.Writer({ format:'N-Triples' });
writer.addQuad(
  N3.DataFactory.namedNode('http://example.org/cartoons#Tom'),
  N3.DataFactory.namedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
  N3.DataFactory.namedNode('http://example.org/cartoons#Cat'),
  N3.DataFactory.namedNode('http://example.org/cartoons#Graph')
);

writer.end((error, result) => console.log(result));

Output:

<http://example.org/cartoons#Tom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/cartoons#Cat> <http://example.org/cartoons#Graph> .

I would expect the writer to ignore the graph when I ask for the N-Triples format.

RubenVerborgh commented 5 years ago

That is currently indeed intended behavior; in the sense that the write assumes you are sending the right input for the format.

Not sure what we should do in this case. Silently ignoring the graph might not be best, dropping it isn't great either; just leaves an error in my opinion.

What do you think?

thombohlk commented 5 years ago

I see your point, didn't look at it from that perspective.

Our current use case is one where we want to export a dataset (which can or cannot contain a graph field) in a certain format. From this point of view it would be weird if we would have to adapt the data to the format we want. I would say N3.js should contain the logic which fields are required for which format.

In my opinion the package should output the format that is requested and only that format. It should check whether or not all data for the requested format is available, and throw an error if it is not.

RubenVerborgh commented 5 years ago

Our current use case is one where we want to export a dataset (which can or cannot contain a graph field) in a certain format. From this point of view it would be weird if we would have to adapt the data to the format we want.

That makes sense on the one hand (as N3.js controls the format), but not on the other hand (given that only the source app knows how to deal with the graph).

It should check whether or not all data for the requested format is available, and throw an error if it is not.

Small issue there is that it is streaming, so we cannot check "all" the data, or at least not in advance. We could send an error the moment we encounter invalid data.

bergos commented 5 years ago

@thombohlk maybe you want to try rdf-transform-triple-to-quad. You can feed it also with factory.defaultGraph() to create triples with that transformation.

jimsmart commented 5 years ago

Arguably, if the caller specifically asked for N-Triples, the library should really only output valid N-Triples.

— So perhaps the writer should be able to enforce that?

Just my 2p!

RubenVerborgh commented 5 years ago

That is a possible argument; the other being that valid input should be sent for valid output. N3.js is a very low-level library; part of its speed comes from assuming valid input. My original point of view on this issue is simply that the behavior of sending quads to a triple writer is undefined. (Especially given that defining it correctly and universally is an impossibility.)

jimsmart commented 5 years ago

Sure, understood. I can see both sides.

— Maybe it just needs more/improved documentation? (Certainly no perf impact with this option! 😉)

RubenVerborgh commented 5 years ago

Open to that! Let’s give people some more time to discuss, and then we can pick.

jimsmart commented 5 years ago

Another possibility would be: when the caller asks specifically for N-Triples, swap out the existing code that writes quads, for a function that only writes triples (completely ignoring the graph).

This would not only solve the problem at hand, but wouldn’t affect the performance of writing N-Quads in any noticeable way (adding only an ‘if’ test at construction time) - and it would, theoretically at least, actually improve the performance of N-Triple writing! 👍 (because there would be one less ‘if’ statement in the core looped method that writes the quads/triples)

Edit: I say all that, but it’s possible that the extra indirection may have some negative impact — JavaScript virtual machines can be a little cranky to predict performance on, without measuring.

RubenVerborgh commented 5 years ago

(completely ignoring the graph).

This would not only solve the problem at hand

It wouldn't; who says it is correct to place triples in non-default graphs in the default graph? That is highly context-dependent. (For the same reason, dropping them is not the perfect solution either.)

jimsmart commented 5 years ago

Ok, well that’s your call. I was just offering a solution that I hadn’t seen being discussed yet.

RubenVerborgh commented 5 years ago

Definitely, proposals welcome (I didn't intend to sound negative, apologies if I did). I like the bit about the performance (we do a similar swapping for line mode versus Turtle/N-Quads), but I'm hesitant to make decisions on behalf of the caller.

RubenVerborgh commented 5 years ago

…it could be configurable though, as in { handleQuads: 'ignore' / 'drop' / 'error' }

jimsmart commented 5 years ago

That’s a good idea!

thombohlk commented 5 years ago

That sounds like a good option! Another solution would be to split it in { strict: true / false, handleQuads: 'drop' / 'error' }. strict would indicate if you want to follow the W3C standards given by format, handleQuads would indicate how to handle inconsistencies between the data and the W3C standard if strict == true.

Both solutions work for me :+1:

LaurensRietveld commented 5 years ago

Just my 2c:

MPvHarmelen commented 2 years ago

Open to that! Let’s give people some more time to discuss, and then we can pick.

So how about this? Was a conclusion reached?

Just to add to a years-old discussion: I would say that specifying n-triples is explicit enough to warrant dropping the graph if it happens to be present.

Making it configurable is also fine, but sounds like more work to implement.

LaurensRietveld commented 1 year ago

Just an update from our side. Not being able to serialize to ntriples without manipulating the quads is becoming a performance hit for us. (having to recreate all quads, removing the graphname) Are you open for a PR @RubenVerborgh ? I have no preference for either of the solutions above, so let me know which one you prefer

RubenVerborgh commented 1 year ago

PR is definitely fine, thanks 👍

LaurensRietveld commented 1 year ago

@RubenVerborgh Going for this:

 { handleQuads:  'keep' / 'drop' / 'error' }

Keep: writes the graph names Drop: omits the graph names Error: throws an error when encountering a graph name

Keep will be the default.

Does this sound good?

RubenVerborgh commented 1 year ago

Could we make it { graphs: 'keep' | 'ignore' | 'error' }, with a default of keep indeed. Default will be changed to error in next vmajor. Thanks a bunch.