Document language tags seem broken

secvisogram / secvisogram

Secvisogram is a web tool for creating and editing security advisories in the CSAF 2.0 format

MIT License

33 stars 17 forks source link

Document language tags seem broken #53

Closed elear closed 2 years ago

elear commented 3 years ago

If you attempt to enter en-US, you get an invalid language, but this is a perfectly valid language tag. The regex agrees, but icann.subtags.find() does not. This can be fixed by either reducing the regex to take just "en" and not "en-US" (probably the wrong answer, since I think it's correct), or to add additional logic into icann.subtags.find() to allow appropriate combinations (might be a pain, but is probably the right thing to do).

elear commented 3 years ago

By the way, the regex is actually wrong, as far as RFC 5646 is concerned. ^[a-zA-Z]{2,3}(-.+)?$ only covers xx-XXXXXX, but doesn't cover xx-xx-xxx-xxx-xx, which is also allowed as a variant or extension. An example in RFC 5646 is the following:

hy-Latn-IT-arevela.

Also, {2,3} won't allow for i-enochian, which is "grandfathered", according to RFC5646.

With all of this, my suggestion is to change the regex to be ^[a-zA-Z]{1,3}(-[a-zA-Z]+)*$, and validate only the first part (before the first -).

elear commented 3 years ago

And just to add one more comment on what I might do to fix it:

if (!icann.subtags.find((s) => s.subtag === doc.document.lang))

is in app/lib/shared/Core/entities/DocumentEntity.js. I would imagine the === is inappropriate, but rather you need to split doc.document.lang along "-" and just look at the first component for comparison.

tschmidtb51 commented 3 years ago

@elear: You're completely right. I found that bug earlier but haven't documented it...

The regex comes from the standard - so we have to fix that there (see oasis-tcs-csaf#71). I have to take a deep dive into RFC5646 / BCP 47 to derive a valid regex...

tschmidtb51 commented 3 years ago

A PR is available at oasis-tcs/csaf#299. Once that is included we will update Secvisogram with the new regex.

tschmidtb51 commented 3 years ago

A PR #195 is available in this repo which updates the regex for the lang_t.