Capitalize All does not recognize multi-character unicode

paulroth3d commented 2 years ago

When running utils.format.capitalizeAll('alpha 𐐶𐐲𐑌𐐼𐐲𐑉') I expect: Alpha 𐐎𐐲𐑌𐐼𐐲𐑉

but instead I get: Alpha 𐐶𐐲𐑌𐐼𐐲𐑉

It seems there is a bug with Regex in matching wordboundaries for multi-character unicode characters.

paulroth3d commented 2 years ago

Perhaps some guidance here: https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

So far in testing, I could not find one that passes all the unit tests for them though. (But I wasn't very thorough, so I may have missed one)

paulroth3d commented 2 years ago

Link to function for when I remember: https://github.com/paulroth3d/jupyter-ijavascript-utils/blob/main/src/format.js#L493

All that is needed is that each string needs the first characters of each word to be sent to utils.format.capitalize (That DOES properly capitalize 𐐶𐐲𐑌𐐼𐐲𐑉 to 𐐎𐐲𐑌𐐼𐐲𐑉)

module.exports.capitalizeAll = function capitalizeAll(str) {
  //-- see if there is anything better to split at the start of word boundaries.
  return (str || '').split(/\b/)
    .map(FormatUtils.capitalize)
    .join('');
};

paulroth3d / jupyter-ijavascript-utils

Capitalize All does not recognize multi-character unicode #9