Closed elear closed 2 years ago
By the way, the regex is actually wrong, as far as RFC 5646 is concerned. ^[a-zA-Z]{2,3}(-.+)?$ only covers xx-XXXXXX, but doesn't cover xx-xx-xxx-xxx-xx, which is also allowed as a variant or extension. An example in RFC 5646 is the following:
hy-Latn-IT-arevela.
Also, {2,3} won't allow for i-enochian, which is "grandfathered", according to RFC5646.
With all of this, my suggestion is to change the regex to be ^[a-zA-Z]{1,3}(-[a-zA-Z]+)*$, and validate only the first part (before the first -).
And just to add one more comment on what I might do to fix it:
if (!icann.subtags.find((s) => s.subtag === doc.document.lang))
is in app/lib/shared/Core/entities/DocumentEntity.js. I would imagine the === is inappropriate, but rather you need to split doc.document.lang along "-" and just look at the first component for comparison.
@elear: You're completely right. I found that bug earlier but haven't documented it...
The regex comes from the standard - so we have to fix that there (see oasis-tcs-csaf#71). I have to take a deep dive into RFC5646 / BCP 47 to derive a valid regex...
A PR is available at oasis-tcs/csaf#299. Once that is included we will update Secvisogram with the new regex.
A PR #195 is available in this repo which updates the regex for the lang_t
.
If you attempt to enter en-US, you get an invalid language, but this is a perfectly valid language tag. The regex agrees, but icann.subtags.find() does not. This can be fixed by either reducing the regex to take just "en" and not "en-US" (probably the wrong answer, since I think it's correct), or to add additional logic into icann.subtags.find() to allow appropriate combinations (might be a pain, but is probably the right thing to do).