semver / semver.org

Semantic Versioning spec and website
https://semver.org
494 stars 281 forks source link

On the front page, "Indonesia" should be "bahasa Indonesia" #262

Closed rec closed 4 years ago

rec commented 5 years ago

A quibble at the top of https://semver.org/: "Indonesia" is the country, "bahasa Indonesia" the language.

While I'm here, "English" and "Deutsch" should be capitalized.

I quickly checked most of the other language names written in Roman or Cyrillic letters and I'm fairly sure the capitalization is correct for those (ie, capitalized for Türkçe and lower case for the rest).


I would have made a pull request for these, except that I cannot find the language names in any of the four files in this repository...

Thanks for a really useful page!

runeimp commented 5 years ago

@rec in the ISO 639-1 standard all language ALPHA-2 codes are lowercase. And whether or not Bahasa is used in front of Indonesia or if it's lowercase or not, seems to depend on the language being spoken. At least according to references such as https://www.loc.gov/standards/iso639-2/php/code_list.php. Which is news to me. I've done localization coding for a few sites as a web developer and hadn't noticed that in the past. Do you have other references that suggest otherwise for generic en page? If so I'd love to see them. Localization is a very interesting topic for me.

rec commented 5 years ago

Yes, I'm also really interested in localization! :-)

Let's start with the text. I speak Indonesian, though it has probably regressed to C2 by this point, but it is certain to me that "Indonesia" is always the country, in the same way that "France" is always France (in French).

Native speakers say just bahasa or in writing, BI sometimes.

But "Indonesia (id)" looks like a country name, like "de (Deutschland)" would


Regarding (independently) capitalization

ISO 639-1 only standardizes the 2- and 3-letter codes, not the full name of the language! From reading your page, it struck me that you intended to have a series of pairs looking like this:

name of the language as it appears in that language (language code)

i.e.

Türkçe (tr)

(which example from the page was definitely motivating for me!)

By that standard, these ones just jump out as "wrong" to me as a speaker of these languages

deutsch (de) 
english (en) 
indonesia (id)

because you just never ever see "deutsch" or "english" or "indonesia" in any printed material anywhere, but always Deutsch, English, or Indonesia.

(If you don't believe me, search for deutsch, and then try and find a lowercase version by clicking to the next page! I gave up. :-D

So it just "reads as wrong".

I did a quick check of the other languages, only a few of which I have any of knowledge of, and they seemed to be correct. Germanic languages capitalize language names in general, and Indonesian/Malay were copying English when they formalized their spelling and capitalization rules.

runeimp commented 5 years ago

I agree with all those points. I guess I just meant to say that the language names all being lowercase could be more the result of normalization efforts. Though as it is a page specifically marked as en in the HTML <html lang="en" dir="ltr"> all the languages should be capitalized with the possible exception of languages that don't have upper and lower case variants but are still referenced using Romanized lettering. In that instance I believe lower case is appropriate as that is the primary case for the majority of letters in any sentence. Though that is just my musings on the subject. I don't know of a specific standard. I would also at the very least expect indonesian instead of indonesia as you note. 👼

dcowan-london commented 5 years ago

From the first page of a google search for "deutsch" :) deutsch - Wiktionary https://en.wiktionary.org/wiki/deutsch Screenshot from 2019-05-14 12-24-27

rec commented 5 years ago

That's an adjective, not a noun! :-D

On Tue, May 14, 2019 at 1:25 PM Dovi Cowan notifications@github.com wrote:

From the first page of a google search for "deutsch" :) deutsch - Wiktionary https://en.wiktionary.org/wiki/deutsch [image: Screenshot from 2019-05-14 12-24-27] https://user-images.githubusercontent.com/50210615/57694333-586dfb80-7643-11e9-909a-94e629da760a.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/semver/semver/issues/512?email_source=notifications&email_token=AAB53MX5BILDKIPA753PND3PVKOTLA5CNFSM4HHORS72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVLFO3Q#issuecomment-492197742, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB53MUAOPSGZK3MV3K7UVTPVKOTLANCNFSM4HHORS7Q .

-- /t

PGP Key: https://flowcrypt.com/pub/tom.ritchford@gmail.com https://tom.ritchford.com https://tom.ritchford.com https://tom.swirly.com https://tom.swirly.com

rec commented 5 years ago

Anyway, we are spending too much time on this little minute thingie. Sorry! :-)

All lower case would also make some sense, though offend my delicate aesthetics :-D but at this point it isn't that either because Turkish is capitalized.

So given changes have to be made.... I would have just emitted a code review but I dunno where this text lives.

Anyway, the semvar page rocks, and I hope you have a great week!

janpio commented 5 years ago

The website is hosted in another repository and the languages seem to be defined and configured here: https://github.com/semver/semver.org/blob/388ffb9bd81fe70f1945c38b38e8f6f74aa04432/_config.yml#L6-L32 A PR to that file should be able to fix all these issues.

saiqulhaq commented 5 years ago

as Indonesian, "Indonesia" is ok we can recognize easily that it gonna change to Indonesia language

websites by Indonesian company usually use 'Indonesia' and 'English' as dropdown/button to change locale

jwdonahue commented 4 years ago

Can we please close this thread? It is linked to from the above referenced PR on the semver/semver.org site (where this discussion belongs), will still be read/writable, and closing it in no way indicates the changes the PR will be accepted.

ljharb commented 4 years ago

Typically issues with associated PRs on github remain open until the PR is resolved.

alexandrtovmach commented 4 years ago

Investigated a bit and didn't find any ISO spec with language native names (endonyms) which we are talking about. Instead, I found this doc from Unicode standard: http://cldr.unicode.org/translation/translation-guide-general/capitalization

Beginning with CLDR 22, the guidance is that names of items such as languages, regions, calendar and collation types ... Regarding the capitalization of months and weekdays, please apply middle-of-sentence capitalization rules even on stand-alone items. In your language, if month and day names are generally lower case in the middle of the sentence, then please apply this same rule (lower case) to both formatting and standalone values. ... However, it is also important to ensure that there is consistent casing for all of the items in a section, so before making any changes, be sure to get agreement among all the translators for your language — otherwise the capitalization of items in a section may appear random.

alexandrtovmach commented 4 years ago

Read more about why it's not standardized here: https://en.wikipedia.org/wiki/Exonym_and_endonym

In addition, few unofficial resources with autonyms that we can use:

rec commented 4 years ago

Reading these documents now... gee, someone did all the work already, how nice. (Also just discovered Blissymbols from the second document.)

Very instructive. Thanks!