panzerdp / voca

The ultimate JavaScript string library
https://vocajs.pages.dev
MIT License
3.6k stars 137 forks source link

titleCase behaviour with prime/apostrophe #27

Closed DJTB closed 6 years ago

DJTB commented 6 years ago

Expected behavior :smile_cat:

v.titleCase("professor's office")
'Professor\'s Office'

Actual behavior :crying_cat_face:

v.titleCase("professor's office")
'Professor\'S Office'

Node version: 8.1.4

Are there situations that I'm missing here where a letter after a prime/apostrophe should be capitalized? The same behaviour occurs with real/smart quotes ( as opposed to ') as well

> v.titleCase("professor’s office")
'Professor’S Office'
Betree commented 6 years ago

Similar problem here : In French most words have a male and a female form. It is an increasing practice to make text gender neutral by showing the two forms (male / female) using a · (middle dot) to separate the two forms instead of using the male form per default.

Exemple with "A deputy" :

Expected behavior :smile_cat:

v.titleCase("Un·e déput·é·e") 'Un·e Déput·é·e'

Actual behavior :crying_cat_face:

v.titleCase("Un·e déput·é·e") 'Un·e Déput·É·E'

panzerdp commented 6 years ago

@DJTB, @Betree Thanks for your time describing the issue.

The function doesn't make complex distinctions: it assumes that one or more letters create a word. It's simply not possible to catch all the scenarios like 's, I'm, ·é·e.

If you have an idea how to make function understand these cases, please let me know.

Thanks.

Betree commented 6 years ago

I ended up writing a function that takes \S - any non-whitespace character (equal to [^\r\n\t\f ]) I don't know if this breaks any existing case but it fits my needs :)

/**
 * A very simple titleCase that in contrast to `voca/title_case` respects the '
 * character (speaker's assistant -> Speaker's Assistant) and the · (middle point)
 * character, especially used in French for gender-neutral form (un·e député·e)
*/

import capitalize from 'voca/capitalize'

const WORD_REGEX = /\S+/g

export default function titleCase(str) {
  return String(str).replace(WORD_REGEX, capitalize)
}

Edit

So my version is not perfect. I guess these rules depend a lot on the locale and I doubt there's a perfect way to implement it.

My final regex will probably look more like that : /[^\s\-]+/g

Maybe voca's function could accept a list of additional characters on which it should split ?

export const titleCase(str, splitChars="-.")

panzerdp commented 6 years ago

@Betree,

Sadly that you have to use this workaround.

I like the idea to customize the title case function in order to cover the special cases. What about a parameter doNotSplitOn = ["'", "·"]? The function will not split into words at the specified characters in the list.

For example:

v.titleCase('jean-luc', ['-']);
// => "Jean-luc"
v.titleCase('Un·e déput·é·e', ['·']);
// => "Un·e Déput·é·e"

Thanks.

DJTB commented 6 years ago

That sounds like a simple and viable solution!

Betree commented 6 years ago

That indeed sounds great 👍

jafethtk commented 6 years ago

This function works pretty well with the mentioned examples (jean-luc, Un·e déput·é·e and professor’s office) https://github.com/gouch/to-title-case/blob/master/to-title-case.js. It might be useful for someone. Hopefully voca will add the corresponding enhancement soon :slightly_smiling_face:

panzerdp commented 6 years ago

Implemented in version 1.4.0 using noSplit param.