tc39 / ecma402

Status, process, and documents for ECMA 402
https://tc39.es/ecma402/
Other
535 stars 104 forks source link

add String.toTitleCase, String.toLocaleTitleCase #294

Open srl295 opened 5 years ago

srl295 commented 5 years ago
'ამდენი'.toUpperCase(); // ᲐᲛᲓᲔᲜᲘ
'ამდენი'.toTitleCase(); // ამდენი
'Ijussite'.toTitleCase(); // Ijussite
'ijsselmeer'.toLocaleTitleCase('nl'); // IJsselmeer

Some background on title case issues for Georgian at this document: https://gist.github.com/srl295/1d9603ecfbcae55a08b04e9cd925d349#problem

jungshik commented 5 years ago

Shouldn't this be filed against tc39/ecma262 (as well) ?

littledan commented 5 years ago

I think it's enough to track it here, even though we would make changes in the main spec too. See also https://github.com/tc39/ecma402/issues/99 .

srl295 commented 5 years ago

also CLDR (data) and ICU (implementation) implement casing via transforms (transliterators) due to complexity.

srl295 commented 5 years ago

Title case is more complex than just these issues.

sffc commented 5 years ago

This is a big pain point in JavaScript. Here's a SO question with 461 upvotes where almost all of the answers are, "take the first char and make it upper case":

https://stackoverflow.com/q/196972/1407170

We should discuss whether the right answer is title case or whether we should use sentence casing, etc.

tomayac commented 5 years ago

I was pointed to this very issue by @littledan because I just wrote an article on a potential CSS text-transform: titlecase feature. The long history of the discussion around this (linked from the article) might be interesting for dealing with the question now in JavaScript land.

hanguokai commented 5 years ago

Title case is usually used in article titles(e.g. <h1></h1>) and menu items of an application.

It is related to rules in different languages/locales. Because natural languages are not like programming languages, there may be more complicated rules or uncertain variants and exceptions. I don't know all languages rules. I think if a language has static and definite rules for title case and that are not affected by different contextual semantics and no ambiguities, it could be implemented in JavaScript, or it is not suitable for implementation in JavaScript.

In Chinese, there is almost no capitalization, uppercase and lowercase concepts. So title case does not apply to Chinese.

In English, I found these references and implementation from https://individed.com/code/to-title-case/ by @gouch .

sffc commented 5 years ago

Another use case to consider is https://github.com/tc39/proposal-intl-displaynames/issues/13

Different types of display names have different capitalization rules based on context. For example, you might titlecase month names in some locales but not in others.

sffc commented 4 years ago

@markusicu What are your thoughts on putting titlecasing more front and center in JavaScript?

markusicu commented 4 years ago

The question is what people mean with "titlecasing". Unicode has a decent spec, and ICU has a solid implementation, for titlecasing at certain boundaries (with adjustment options) and leaving alone or lowercasing the rest of the string. However, different people use it for different things.

Some people want just the start of the string titlecased. Some want the start of each sentence. Some want the start of each word. ICU lets you provide different BreakIterator instances/options for these choices.

In the US, there is a peculiar style of "titlecasing" book titles and article headlines that titlecases some words but not others. This is language- and style-specific and not built into ICU. You would need to provide the offsets to ICU for where to titlecase and where not.

Note that like all case mapping operations, titlecasing is a lossy operation. It's also not always obvious. It is not always actually desirable to titlecase the first character of a word and lowercase the rest. Think of acronyms like NASA, names like McDonald, product names like iPhone. The best we have for that is the "don't lowercase the rest" option.

FYI For some characters, titlecasing is different from uppercasing.

FYI Yes, CLDR/ICU have "Transliterator" rules for case mappings, but most people don't use them. For example, Greek uppercasing would be more difficult with a Transliterator rule than with the hand-coded implementation in the low-level API I think.

domenic commented 4 years ago

Just chiming in because opinions were solicited on Twitter...

In the US, there is a peculiar style of "titlecasing" book titles and article headlines that titlecases some words but not others. This is language- and style-specific and not built into ICU. You would need to provide the offsets to ICU for where to titlecase and where not.

If there was a JS standard library function called toTitleCase(), and it was not usable for the purposes of US book titles and article headlines, this would be extremely surprising. From the rest of this thread I am gathering that what Unicode/ICU calls "titlecase" applies only to single "words", for some definition of word? In that case a method name more like wordToTitleCase() would help.

aphillips commented 4 years ago

@domenic The US isn't the world. There exist different cultural conventions regarding titlecasing, even within English. If the JS standard library function toTitleCase() only serves US English booktitle's idiosyncratic needs, it's not really as useful as one might think.

W3C-I18N recently closed an issue related to CSS (it was quite an old issue--we were housekeeping). Basically CSS decided that the text-transform: capitalize style (used to create a titlecasing effect) only affects lowercase letters. This helps avoid problems with over-case-normalizing words with internal capitals ("McGowan"). The overall thread helps illustrate how titlecase is more complex than it appears to be. (Charmod-Norm spills a small amount of ink on it too, although it barely mentions titlecasing).

I do think a locale-aware titlecasing function would be useful. As @markusicu mentions, ICU has a solid implementation that covers most user's needs for most strings. But the gaps are not isolated in obscure locales or scripts.

domenic commented 4 years ago

To be clear, I'm not suggesting adding a toTitleCase() that does US English titlecasing. I'm simply saying that if a toTitleCase () is added, and it fails to do US English titlecasing, that would be extremely surprising. As such I was suggesting that a different name be used for a function that does the style of titlecasing that this thread seems to be discussing.

aphillips commented 4 years ago

@domenic Ah, I get it. Still, most functions that claim to "titlecase" in other programming languages are algorithmic and fail to get US English titlecasing correct either. To your point, notice that CSS's transform is called "capitalize". That might be a good choice here too, since unlike wordToTitleCase, it suggests avoiding lowercasing the rest of the string.

leobalter commented 4 years ago

As the editor, I don't have a specific preference if we should add the features discussed here, but I have some observations:

IMO, wordToTitleCase is not a method name I'd love to see, but take this as a personal note and we first need to verify if we are up to add the feature, regardless of naming.

@sffc let's add this to the discussions for the next TG2 meeting?

sffc commented 4 years ago

@everyone: if you want to see titlecasing (regardless of the exact implementation, e.g. capitalize individual words versus string toTitleCase), please :+1: the OP. There are still only 2 votes for this issue. I can't tell whether the discussion in this thread is "if we were to theoretically do this, this is what it should look like" or "I think we should do this, and here's some discussion to get the ball rolling".

aphillips commented 4 years ago

@leobalter capitalize is not "universal" and does depend on locale in exactly the same way that upper/lower/titlecasing does and for the same reasons. The point of doing capitalize instead of titlecase is what I mentioned in my earlier comments: it's complex to get titlecase right. User's of capitalize can get the effect of (suboptimal) titlecasing by lowercasing the string first.

leobalter commented 4 years ago

@aphillips it seems it's not that simple even for capitalization, then. TIL, thanks for the heads up.

@sffc in my position I'd be using the feature, rather than implementing it. I'm definitely down to see it being discussed.