RFC: RMLTC0015b-CSV & RMLTC0015b-JSON & RMLTC0015b-XML

RFC for:

A term map with a term type of rr:Literal may have a specified language tag. It is represented by the rr:language property on a term map. If present, its value must be a valid language tag.

The question is what "valid" means here. In RFC 3066 (referenced by the R2RML spec) there is no explicit definition of validity.

Validity is defined in successor BCP 47, and requires next to the language tag being "well-formed", require its "subtags appear in the IANA Language Subtag Registry as of the particular registry date".

Looking at https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal, that requires a language tag to be well-formed, but not valid per se.

From all this it isn't completely clear at the moment which requirements an RML engine should follow.

Now, besides all that, looking at these test-cases, the language tags that are supposedly invalid are "english" and "spanish", yet, taking both RFC 3066 and BCP 47 into account, both are well-formed language tags. They are however invalid according to the BCP 47 definition. But, I have doubts that this level of validation should be required by engines. That would require an engine to keep up with the IANA Subtag Registry.

Proposed:

Interpret "valid language tag" according to the R2RML spec as "well-formed" according to BCP 47, following https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal.
Change test cases to contain ill-formed language tag references.

kg-construct / rml-test-cases

RFC: RMLTC0015b-CSV & RMLTC0015b-JSON & RMLTC0015b-XML #15