pwin / owlready2

GNU Lesser General Public License v3.0
132 stars 22 forks source link

Test in _is_valid_language_code(s) is not complete #29

Open ThomasHoppe opened 1 year ago

ThomasHoppe commented 1 year ago

Valid language tags lik3 "en-GB", "de-AT" are not recognized.

The test for valid language tags is buggy. IETF BCP 47 says that language tags consist of a country component (the first two chars) and a region component (the fourth and fifth chars) separated by a hyphen ('-'), not underscore ('_').

https://www.w3.org/TR/ltli/ says that "Specifications for the Web that require language identification MUST refer to [BCP47] ". Since ontologies and rdf are specifications for the web, the function needs to be corrected.

I think this bug is caused by the pythonian way you like to access ontologies. I.e. concept.label.en Clearly, concept.label.en-GB wouldn't work, since - is not a valid char in python names. On the other hand ontologies and rdf may contain label following with the BCP47 spec. So the test needs to be augmented for handling both cases.