Open BenjaminHofstetter opened 8 months ago
This would be possible indeed, if the parser emits the context from the tokenizer in the quads.
We have no plans to take this up, but a pull request that puts this functionality behind a flag would be welcome, provided it has no performance impact when switched off.
This is excactly what I need too. I am currently developing an RDF editing extension for Visual Studio Code named Mentor. For this use case I frequently need to resolve URIs and blank nodes to parsed Tokens and this feature would be extremely helpful.
I found a workaround for URIs which requires parsing the document again after loading and interpreting the Triples, but that only works for URIs and not for blank nodes. This currently blocks me from implementing SHACL support where blank node definitions of (property) shapes are quite common.
Any idea how such source maps could be implemented?
Any idea how such source maps could be implemented?
Luckily tokens emitted by the Lexer already contain information about the line and position of each token emitted by the lexer. In the Parser you could add this information property of Term
s every time a new _subject
, _predicate
, _object
or _graph
is assigned in the parser. For instance the code here would become
this._subject = this._blankNode();
if (this._recordPosition) {
this._subject[POS] = { line: token.line, start: token.start }
}
this._saveContext('blank', this._graph,
this._subject, null, null);
I would recommend making POS
a Symbol that is exported by N3.js, however it could also just be a property name like _internal_position
.
The caveat of this approach would be that it might cause a non-negligible performance hit even when the feature is disabled; but I suspect this is something you can perf. test and optimise once the feature is implemented.
I did a POC some time ago. I added it as a use case in the RDF-Star working group. Maybe in the future we can use RDF-Start to define such source maps "externally" from the source turtle. https://github.com/w3c/rdf-star/issues/285#issuecomment-2003235647
My poc is using n3 parser and exposes the tokens in the quads (not rdf-star).
@BenjaminHofstetter Did you create a patch for N3 and publish the code of the PoC somewhere?
Perhaps change the issue title from —
Store Token position in the produces quads
— to —
Store original positions of Tokens in quads produced by conversion from Turtle"
?
(At least, change produces
to produced
.)
Why do I need that: After parsing a Turtle file, I lose all information about the source file. For better tooling support, I propose implementing some kind of "source maps" to trace back from quads to positions in the Turtle file.
For instance, in tools like https://shacl-playground.zazuko.com/, when encountering errors in SHACL validation reports, locating the error-causing triple requires human intervention. With source map information, editors could pinpoint the exact location in the Turtle file, aiding in error resolution. Implementing source maps would bridge the gap between parsed files and their source, enhancing tooling support. The tokenizer already generates tokens with line, start, and end information, laying the groundwork for this feature.