paulroth3d / jupyter-ijavascript-utils

Utility library for working with iJavaScript - a Jupyter Kernel
1 stars 0 forks source link

Capitalize All does not recognize multi-character unicode #9

Open paulroth3d opened 2 years ago

paulroth3d commented 2 years ago

When running utils.format.capitalizeAll('alpha πΆπ²π‘ŒπΌπ²π‘‰') I expect: Alpha πŽπ²π‘ŒπΌπ²π‘‰

but instead I get: Alpha πΆπ²π‘ŒπΌπ²π‘‰

It seems there is a bug with Regex in matching wordboundaries for multi-character unicode characters.

paulroth3d commented 2 years ago

Perhaps some guidance here: https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

So far in testing, I could not find one that passes all the unit tests for them though. (But I wasn't very thorough, so I may have missed one)

paulroth3d commented 2 years ago

Link to function for when I remember: https://github.com/paulroth3d/jupyter-ijavascript-utils/blob/main/src/format.js#L493

All that is needed is that each string needs the first characters of each word to be sent to utils.format.capitalize (That DOES properly capitalize πΆπ²π‘ŒπΌπ²π‘‰ to πŽπ²π‘ŒπΌπ²π‘‰)

module.exports.capitalizeAll = function capitalizeAll(str) {
  //-- see if there is anything better to split at the start of word boundaries.
  return (str || '').split(/\b/)
    .map(FormatUtils.capitalize)
    .join('');
};