panzerdp / voca

The ultimate JavaScript string library
MIT License
3.6k stars 137 forks source link

titleCase behaviour with prime/apostrophe #27

Closed DJTB closed 6 years ago

DJTB commented 6 years ago

Expected behavior :smile_cat:

v.titleCase("professor's office")
'Professor\'s Office'

Actual behavior :crying_cat_face:

v.titleCase("professor's office")
'Professor\'S Office'

Node version: 8.1.4

Are there situations that I'm missing here where a letter after a prime/apostrophe should be capitalized? The same behaviour occurs with real/smart quotes ( as opposed to ') as well

> v.titleCase("professor’s office")
'Professor’S Office'
Betree commented 6 years ago

Similar problem here : In French most words have a male and a female form. It is an increasing practice to make text gender neutral by showing the two forms (male / female) using a · (middle dot) to separate the two forms instead of using the male form per default.

Exemple with "A deputy" :

Expected behavior :smile_cat:

v.titleCase("Un·e déput·é·e") 'Un·e Déput·é·e'

Actual behavior :crying_cat_face:

v.titleCase("Un·e déput·é·e") 'Un·e Déput·É·E'

panzerdp commented 6 years ago

@DJTB, @Betree Thanks for your time describing the issue.

The function doesn't make complex distinctions: it assumes that one or more letters create a word. It's simply not possible to catch all the scenarios like 's, I'm, ·é·e.

If you have an idea how to make function understand these cases, please let me know.


Betree commented 6 years ago

I ended up writing a function that takes \S - any non-whitespace character (equal to [^\r\n\t\f ]) I don't know if this breaks any existing case but it fits my needs :)

 * A very simple titleCase that in contrast to `voca/title_case` respects the '
 * character (speaker's assistant -> Speaker's Assistant) and the · (middle point)
 * character, especially used in French for gender-neutral form (un·e député·e)

import capitalize from 'voca/capitalize'

const WORD_REGEX = /\S+/g

export default function titleCase(str) {
  return String(str).replace(WORD_REGEX, capitalize)


So my version is not perfect. I guess these rules depend a lot on the locale and I doubt there's a perfect way to implement it.

My final regex will probably look more like that : /[^\s\-]+/g

Maybe voca's function could accept a list of additional characters on which it should split ?

export const titleCase(str, splitChars="-.")

panzerdp commented 6 years ago


Sadly that you have to use this workaround.

I like the idea to customize the title case function in order to cover the special cases. What about a parameter doNotSplitOn = ["'", "·"]? The function will not split into words at the specified characters in the list.

For example:

v.titleCase('jean-luc', ['-']);
// => "Jean-luc"
v.titleCase('Un·e déput·é·e', ['·']);
// => "Un·e Déput·é·e"


DJTB commented 6 years ago

That sounds like a simple and viable solution!

Betree commented 6 years ago

That indeed sounds great 👍

jafethtk commented 6 years ago

This function works pretty well with the mentioned examples (jean-luc, Un·e déput·é·e and professor’s office) It might be useful for someone. Hopefully voca will add the corresponding enhancement soon :slightly_smiling_face:

panzerdp commented 6 years ago

Implemented in version 1.4.0 using noSplit param.