ppannuto / python-titlecase

Python library to capitalize strings as specified by the New York Times Manual of Style
MIT License
244 stars 36 forks source link

Should non-breaking spaces engage in titlecasing? #95

Closed robinwhittleton closed 3 months ago

robinwhittleton commented 6 months ago

At the moment we split words based on tabs or a normal space character. This means that words following a non-breaking space don’t properly get titlecased. Example:

>>> from titlecase import titlecase as pip_titlecase
>>> pip_titlecase('mrs. test')
'Mrs. Test'
>>> pip_titlecase('mrs. test')
'Mrs.\xa0test'

Presumably the fix is as simple as adding a non-breaking space character to https://github.com/ppannuto/python-titlecase/blob/418c57ca6c7f324ddc2813b3fc88d52e84db63bd/titlecase/__init__.py#L103 (although it’s fair to say that I haven’t tested). Is this wanted? If so I’ll put a PR together.

ppannuto commented 4 months ago

That makes sense to me; happy to take a PR.

robinwhittleton commented 3 months ago

OK, first question: we presumably want to preserve the type of space used, but historically the code throws away whether it’s a tab or space separator and just joins them with a space: result = " ".join(tc_line). Given that I’d be changing existing behaviour If I update the code to preserve and rejoin with the original characters, would you want me to add that as a preserve_space_characters option?

robinwhittleton commented 3 months ago

OK, it wasn’t much work so I ended up doing this in a PR anyway: https://github.com/ppannuto/python-titlecase/pull/97. If you’d rather that this is the default behaviour and doesn’t need a switch then it’s easy enough to remove and rebase.

ppannuto commented 3 months ago

Closed by #97.