Open thombohlk opened 5 years ago
That is currently indeed intended behavior; in the sense that the write assumes you are sending the right input for the format.
Not sure what we should do in this case. Silently ignoring the graph might not be best, dropping it isn't great either; just leaves an error in my opinion.
What do you think?
I see your point, didn't look at it from that perspective.
Our current use case is one where we want to export a dataset (which can or cannot contain a graph field) in a certain format. From this point of view it would be weird if we would have to adapt the data to the format we want. I would say N3.js should contain the logic which fields are required for which format.
In my opinion the package should output the format that is requested and only that format. It should check whether or not all data for the requested format is available, and throw an error if it is not.
Our current use case is one where we want to export a dataset (which can or cannot contain a graph field) in a certain format. From this point of view it would be weird if we would have to adapt the data to the format we want.
That makes sense on the one hand (as N3.js controls the format), but not on the other hand (given that only the source app knows how to deal with the graph).
It should check whether or not all data for the requested format is available, and throw an error if it is not.
Small issue there is that it is streaming, so we cannot check "all" the data, or at least not in advance. We could send an error the moment we encounter invalid data.
@thombohlk maybe you want to try rdf-transform-triple-to-quad. You can feed it also with factory.defaultGraph()
to create triples with that transformation.
Arguably, if the caller specifically asked for N-Triples, the library should really only output valid N-Triples.
— So perhaps the writer should be able to enforce that?
Just my 2p!
That is a possible argument; the other being that valid input should be sent for valid output. N3.js is a very low-level library; part of its speed comes from assuming valid input. My original point of view on this issue is simply that the behavior of sending quads to a triple writer is undefined. (Especially given that defining it correctly and universally is an impossibility.)
Sure, understood. I can see both sides.
— Maybe it just needs more/improved documentation? (Certainly no perf impact with this option! 😉)
Open to that! Let’s give people some more time to discuss, and then we can pick.
Another possibility would be: when the caller asks specifically for N-Triples, swap out the existing code that writes quads, for a function that only writes triples (completely ignoring the graph).
This would not only solve the problem at hand, but wouldn’t affect the performance of writing N-Quads in any noticeable way (adding only an ‘if’ test at construction time) - and it would, theoretically at least, actually improve the performance of N-Triple writing! 👍 (because there would be one less ‘if’ statement in the core looped method that writes the quads/triples)
Edit: I say all that, but it’s possible that the extra indirection may have some negative impact — JavaScript virtual machines can be a little cranky to predict performance on, without measuring.
(completely ignoring the graph).
This would not only solve the problem at hand
It wouldn't; who says it is correct to place triples in non-default graphs in the default graph? That is highly context-dependent. (For the same reason, dropping them is not the perfect solution either.)
Ok, well that’s your call. I was just offering a solution that I hadn’t seen being discussed yet.
Definitely, proposals welcome (I didn't intend to sound negative, apologies if I did). I like the bit about the performance (we do a similar swapping for line mode versus Turtle/N-Quads), but I'm hesitant to make decisions on behalf of the caller.
…it could be configurable though, as in { handleQuads: 'ignore' / 'drop' / 'error' }
That’s a good idea!
That sounds like a good option! Another solution would be to split it in { strict: true / false, handleQuads: 'drop' / 'error' }
. strict
would indicate if you want to follow the W3C standards given by format
, handleQuads
would indicate how to handle inconsistencies between the data and the W3C standard if strict == true
.
Both solutions work for me :+1:
Just my 2c:
n-triples
, means that that person is not only ok with dropping the graph names, but has the intention of doing that as wellOpen to that! Let’s give people some more time to discuss, and then we can pick.
So how about this? Was a conclusion reached?
Just to add to a years-old discussion: I would say that specifying n-triples
is explicit enough to warrant dropping the graph if it happens to be present.
Making it configurable is also fine, but sounds like more work to implement.
Just an update from our side. Not being able to serialize to ntriples without manipulating the quads is becoming a performance hit for us. (having to recreate all quads, removing the graphname) Are you open for a PR @RubenVerborgh ? I have no preference for either of the solutions above, so let me know which one you prefer
PR is definitely fine, thanks 👍
@RubenVerborgh Going for this:
{ handleQuads: 'keep' / 'drop' / 'error' }
Keep: writes the graph names Drop: omits the graph names Error: throws an error when encountering a graph name
Keep will be the default.
Does this sound good?
Could we make it { graphs: 'keep' | 'ignore' | 'error' }
, with a default of keep
indeed. Default will be changed to error
in next vmajor. Thanks a bunch.
When using a Writer with format N-Triples and adding quads that include a graph, the writer will output in N-Quads format.
Code (using N3.js version 1.0.4):
Output:
I would expect the writer to ignore the graph when I ask for the N-Triples format.