validatorjs / validator.js

String validation
MIT License
23.01k stars 2.29k forks source link

isLocale does invalidate a lot of valid language tags #2100

Open sosafe-jochen-wikenhauser opened 1 year ago

sosafe-jochen-wikenhauser commented 1 year ago

Describe the bug isLocale does not validate language tags correctly. There are a lot of valid language tags that are not valid for isLocale. But could be that I do not understand correctly if there is a difference between "locale" and "language tag".

Examples i.e. zh-CHS is not correctly validated by isLocale. In fact every locale with a 3 letter subtag is invalidated. According to the Wikipedia page (I know, but primary sources are hard to come by) there are a lot of other valid codes not covered by the regex of isLocale.

Additional context Validator.js version: Master & all since isLocale is introduced Node.js version: not relevant OS platform: not relevant

WikiRik commented 1 year ago

isLocale was added in https://github.com/validatorjs/validator.js/pull/1072 and there was some discussion there around which locales to support. Any idea on how the RegEx could be improved, keeping the original idea in mind?

theVJagrawal commented 1 year ago

Can I work on this one? I am pretty new to this

kwahome commented 1 year ago

I suppose the only comprehensive way to cover all valid language tags is to implement an RFC5646 compliant regex

kwahome commented 1 year ago

@sosafe-jochen-wikenhauser you make a good point about whether there is a difference between a "locale" and "language tag".

w3.org describe a language tag as:

Language tag. A string used as an identifier for a language. In this document, the term language tag always refers explicitly to a [BCP47] language tag. These language tags consist of one or more subtags.

and a locales as:

Locale. An identifier (such as a language tag) for a set of international preferences. Usually this identifier indicates the preferred language of the user and possibly includes other information, such as a geographic region (such as a country). A locale is passed in APIs or set in the operating environment to obtain culturally-affected behavior within a system or process.

Thus your interpretation is not wrong in my opinion.