hexojs / hexo

A fast, simple & powerful blog framework, powered by Node.js.
https://hexo.io
MIT License
39.43k stars 4.85k forks source link

Validate lang #5186

Open pickfire opened 1 year ago

pickfire commented 1 year ago

I noticed https://www.ihcblog.com/http-framework-design-axum-as-an-example/ and even the examples shown in this repository uses zh-CN and zh-TW as seen in https://github.com/hexojs/hexo/blob/7edbf25d5b4c4d7fdf99ff2439cfd9b2cb1cfe0d/lib/plugins/helper/date.js#L73 and other parts is not a valid option as mentioned in the specification.

This is not the first time I noticed this mistake in websites so I guess it is quite common, not using a valid lang recognized by the browser will cause the browser to render it in a different language other than the specified language, in my case it will cause the browser to use the incorrect font configured by the system (I configured arch linux to use a different chinese font for readability as in https://wiki.archlinuxcn.org/wiki/%E5%AD%97%E4%BD%93%E9%85%8D%E7%BD%AE/%E4%B8%AD%E6%96%87).

In the case of broken lang as mentioned in the above link.

image

When it is correct, it looks like this. (which wikemedia like what wikimedia/wikipedia has done)

image

image

Can refer to https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang and https://datatracker.ietf.org/doc/html/rfc5646. zh-CN should be either zh-Hans-CN or zh-CN and zh-TW should be zh-Hant-TW or zh-Hant. If validation is done, ideally these suggestions should be given for users that used the old incorrect values to the new correct values.

I did the same for japanese but I don't think it is an issue with ja since it does not have distinction between simplified zh-Hans and traditional language zh-Hant which chinese have.

Check List

Please check followings before submitting a new issue.

Expected behavior

hexo should prevent users from setting the invalid values.

Actual behavior

hexo just accepts whatever the user enters (most likely) since zh-CN works.

How to reproduce?

Is the problem still there under "Safe mode"?

Not sure but I did not try.

Environment & Settings

I am not a hexo user.

Hexo and Plugin version(npm ls --depth 0)

Your package.json package.json

Others

lorezyra commented 1 year ago

Language codes are based on ISO-639-1 standard. HTML and CSS recognize these codes. However zh-Hans-CN and zh-Hant-TW are not recognized universally by HTML/CSS/JS...

When I wrote the read-time plugin for hexo (https://github.com/AsemAlhaidary/hexo-generator-readtime/), I did find codes for zh-Hans and zh-Hant.

I don't see the need to change the lang-code until the specification for ISO-639 is changed and recognized by all major browsers.

jonassmedegaard commented 1 year ago

ISO-639-1 indeed defines languages, but what is needed on the Web is more nuanced.

W3 describes it well here:

Content authors and webmasters also need to know how to use values for languages in a standard way. The current standard approach for W3C specifications is to use the rules expressed in BCP 47. This replaces earlier specifications such as RFC 3066 and RFC 1766, and goes beyond information available in the ISO language and country standards. You should also use the IANA Language Subtag Registry to look up language tags, rather than the ISO specifications.

(emphasis mine)

lorezyra commented 1 year ago

@jonassmedegaard,

You are correct that BCP 47 can be used. And technically, there's nothing stopping us from using that in our Hexo projects.

When I wrote the read-time plugin, I used the ISO standard as I didn't see it mentioned in the MDN docs. W3 promotes BCP, but not everyone recognizes that. I know that Google translate supports the BCP-47 specifications. And, there is some overlap between ISO-639 and BCP-47.

I've added aliases in my read-time plugin for support of other lang codes. And, it's trivial to add such support into Hexo (for your own theme).

I'm in the process of building a Hexo theme that supports at least 28 languages. It's designed for professional bloggers that want to give their audiences a custom experience similar to Twitter or Mastodon. I've spent the past 18 months building it and I don't feel it's complete. But feel free to visit the dev version of my theme: https://2022.blog.richiebartlett.com .