linkeddata / rdflib.js

Linked Data API for JavaScript
http://linkeddata.github.io/rdflib.js/doc/
Other
562 stars 142 forks source link

turtle parsing not correctly handling `.` in suffix #601

Open jeswr opened 1 year ago

jeswr commented 1 year ago

As discussed in this thread; terms like ex:a.b appear to be valid according to the definition of PN_LOCAL in the Turtle Grammar. However it appears that rdflib is interpreting this as ex:a . b rather than as a single term.

The terms are correctly parsed by N3.js but not by the custom parser in this library.


It might be worth considering migrating to use rdf-parse & rdf-serialize instead of the custom parsers; given that those packages use parser/serializers that are 100% passing the spec tests, and most are also currently having RDF-star & RDF 1.2 support added to them.

situx commented 1 year ago

I second this. I recently tried to parse the GeoSPARQL 1.1 vocabularies with rdflib.js, but it does not work because of this problem. All URIs describing examples in the GeoSPARQL specification contain dots.

timbl commented 1 year ago

They don't though seem to be going toward full Notation3 support. The current parser in rdflib will parse not only turtle but full Notation3 features like

which are useful for knowledge about knowledge, time-qualified data, and rules, and proofs

but also some things which were dropped from the Turtle spec for no understandable reason

ad also it has some things which are just fun and useful especially for testng

There is probably other stuff but that's it off the top of my head. The parser is old, and was converted from python, but the functionality is more than just turtle.

I guess we could could keep it for things explicitly labelled text/n3 and use the other one .. though I know there were many issues trying to connect the RDF object models.

But I suggest we change the notation3 parser to match the turtle spec. By adding a dot to the allowed things in a name. There may be side-effects as there will have to be kludge code to check edge cases like :alice. :knows :bob. ...

timbl commented 1 year ago

((The history is that Notation3 was first, and then people standardized turtle as a subset but unfortunately then made small incompatible changes in the turtle spec. The addition of the dot as being allowed in names, when dot is already a punctuation in the grammar, obviously makes the language more complicated, with communication between the tokenizer and the grammar parser. of course another would be to promote a change in the turtle standard to fix that and a few other things. But life may be too short))

jeswr commented 1 year ago

They don't though seem to be going toward full Notation3 support. The current parser in rdflib will parse not only turtle but full Notation3 features like [...] I suggest we change the notation3 parser to match the turtle spec.

rdf-parse & rdf-serialize already support most (perhaps even all?) standardised RDF serializations. This includes support for Notation3 and Turtle given by N3.js under the hood.

The only Notation3 features listed above that N3.js may be missing in its parser is support for naked names & sets (@RubenVerborgh I'm guessing you would know?).

Indeed N3.js is not up to scratch for serializing Notation3 at present as this custom code is required to use it for serializing Notation3 for use with the webassembly distribution of the eye reasoner.

RubenVerborgh commented 1 year ago

support for naked names & sets

The N3.js code was based on a combined interpretation of https://www.w3.org/DesignIssues/Notation3.html and https://www.w3.org/TeamSubmission/n3/. I don't think either of them supports naked terms. The set syntax {$ 1, 2, <a> $} is marked as not part of N3.

A new effort is on the way to standardize N3 as a superset of Turtle: https://w3c.github.io/N3/spec/. This is also how N3.js interprets N3, so all Turtle syntax constructs like ex:a.b are also supported.

I think rdflib.js should definitely support the full Turtle syntax (for MIME type text/turtle), and also consider parsing N3 as a superset of Turtle (as @timbl suggests above), which would be in line with the new N3 spec effort.

robertschubert commented 4 months ago

I second this also. Tried to parse a path like this: sh:path prefix:P2.1.1 The occurrence of the dots lead to an error.