ppannuto / python-titlecase

Python library to capitalize strings as specified by the New York Times Manual of Style
MIT License
249 stars 38 forks source link

Random short words seem to be all-capitalized, why? #68

Closed ses4j closed 3 years ago

ses4j commented 4 years ago

Why are PCL BCL and BCT made into all-caps when I run this:

> titlecase('pca pcl bcl acl bct')
'Pca PCL BCL Acl BCT'

I have no external wordlist file that I'm aware of.

ppannuto commented 4 years ago

I believe that's tripping the heuristic that a string of all consonants is most likely an acronym, and therefore should be capitalized.

ppannuto commented 3 years ago

Closing I don't think there's an actual issue here -- feel free to re-open if appropriate.

ses4j commented 3 years ago

I see. That makes sense, but also that seems like a heuristic that will cause more harm than good. Can it be controlled or disabled? Maybe documented? The README says "The filter employs some heuristics to guess abbreviations that don't need conversion." but this is guessing acronyms that do, which is different and causes us quite a bit of trouble.

ppannuto commented 3 years ago

At the end of the day, a regex+huersitics-based approach is always going to be imperfect. The wordlist feature should hopefully produce a reasonable escape-hatch for domain specific acronyms.