rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
676 stars 127 forks source link

Correct order of language tag and data type IRI? #307

Closed rorlic closed 1 year ago

rorlic commented 1 year ago

It may be a philosophical discussion, but what is the correct order of value, language and datatype in a RDF literal? A. value, datatype, language (e.g. '"abc"^^http://example.org@en-us') B. value, language, datatype (e.g. '"abc"@en-us^^http://example.org')

I see option A in the tests but option B. feels more natural and IMHO would be easier to parse because @ could theoretically be part of the IRI.

A small tests proves that the library currently does not handle literals containing both a language tag and data type:

  describe('A literal value with value, datatype and language', () => {
    it('should determine datatype correctly', () => {
      new Literal('"abc"^^http://example.org@en-us').datatypeString.should.be.equal('http://example.org');
    })
  })

results in:

       A literal value with value, datatype and language
         should determine datatype correctly:

      AssertionError: expected 'http://example.org@en-us' to equal 'http://example.org'
      + expected - actual

      -http://example.org@en-us
      +http://example.org

If the library does not expect literals containing both a language tag and data type, then the tests should not define those as examples of correct behavior.

TallTed commented 1 year ago

I believe this is answered in §3.3 Literals of RDF 1.1 Concepts and Abstract Syntax, where the order is described as lexical form, datatype IRI, language tag.

Also note that the language tag is allowed "if and only if the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString" — which is why some serializations don't allow the datatype IRI in a language tagged literal; instead, the presence of a well-formed language tag on the end of a string literal tells all concerned that the literal is a <http://www.w3.org/1999/02/22-rdf-syntax-ns#langString>.

TallTed commented 1 year ago

Also, I suggest you edit your initial posting, and instead of wrapping the sample literals in single-quotes, wrap them in backticks, so that (among other things) you're not tagging the GitHub user @en-us, who will otherwise get pinged on every update to this issue.

rorlic commented 1 year ago

Also, I suggest you edit your initial posting, and instead of wrapping the sample literals in single-quotes, wrap them in backticks, so that (among other things) you're not tagging the GitHub user @en-us, who will otherwise get pinged on every update to this issue.

Done 😅

rorlic commented 1 year ago

I believe this is answered in §3.3 Literals of RDF 1.1 Concepts and Abstract Syntax, where the order is described as lexical form, datatype IRI, language tag.

Also note that the language tag is allowed "if and only if the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString" — which is why some serializations don't allow the datatype IRI in a language tagged literal; instead, the presence of a well-formed language tag on the end of a string literal tells all concerned that the literal is a <http://www.w3.org/1999/02/22-rdf-syntax-ns#langString>.

I checked the specs before but technically A literal is a language-tagged string if the third element is present. does not mean the language is in the last positon. It is open for interpretation, i.e. <value>[@<language>][^^data-type] would IMHO be a possible and easier option to parse because the @ is a valid char in an IRI.

All philosophy aside, so it is: <value>[^^data-type][@<language>]

In that case because n3.js does not support it (see above test), the tests should be fixed.

TallTed commented 1 year ago

Given that we're discussing N3.js, I should perhaps have also included a pointer to § 3.5 Literals of the latest Notation3 Draft Community Group Report, as of 10 July 2022, which really removes all philosophy (emphasis mine) --

NOTE If no datatype IRI or language tag is given, the datatype xsd:string will be assumed. In case a language tag is given, the datatype rdf:langString will be assumed. Note that it is not possible to specify both a datatype IRI and a language tag.

This line from the § 5.6 Grammar EBNF should replace your possibilities --

[26]   rdfLiteral   ::=   STRING ( LANGTAG | ( "^^" iri) ) ?

In conclusion -- Yes, the tests should be fixed.

RubenVerborgh commented 1 year ago

Hi @rorlic,

It may be a philosophical discussion, but what is the correct order of value, language and datatype in a RDF literal?

It does not exist in any concrete syntax.

In conclusion -- Yes, the tests should be fixed.

The test it is validating fail-safe parsing of an internal datatype. It is not RDF, nor Turtle, nor N3.