Closed roozbehp closed 5 years ago
Note that the New Oxford Spelling Dictionary recommends
demo-crat
But it also recommends
dem|oc¦ra|tise
where | indicates a primary hyphenation point and ¦ a secondary hyphenation point. But I guess this makes sense, since it is consistent with the way the words are pronounced.
Thanks for the report; I doubt we’ll be updated the patterns in TeX distributions, but it’s always good to know.
Shouldn't we report this to the author and Barbara Beeton?
Sure, we can, but Barbara is already aware, see http://tug.org/pipermail/tex-hyphen/2017-June/001613.html (although she doesn’t seem to have published the new list of hyphenation exceptions); and you know as well as I do that even if can improve the current en-US
patterns there will be immense resistance to changing them in distributions due to stability concerns. But I’ll contact Gerard Kuiken if I can find the time.
Closing; we unfortunately feel we may not simply update the patterns because of the TeX community’s vaguely formulated compatibility policies, but I have noted this as a “known bug” (only two at the moment, this one and another one which is rather funny).
We would be happy to hear about other reported errors in the patterns, and if there is a tracker that we can follow, please let us know!
de-moc-rats (etc.) is a bug in Kuiken's usenglishmax. Knuth's patterns find no hyphenation points, a different/expected/unimportant bug.
Can't usenglishmax have a different list of exceptions than the regular english? I see no prospect of updating Kuiken's patterns, unless you want to try contacting him. I haven't communicated with him in more than a decade.
Debugging an Android user report, I found that Android was hyphenating the words "democrat" and "democrats" incorrectly, as:
de-mo-c-rat de-moc-rats
While Merriam Webster was recommending:
dem-o-crat
And Plain TeX was hyphenating as:
demo-crat democrats
Digging deeper, the source of the problem seems to be the following pattern in hyph-en-us.pat.txt:
5moc1ra1t
That pattern seems to not exist in Plain TeX's pattern file for US English. The other patterns applying to those words, all existing in Plain TeX, are:
1mo 4mocr 5crat.
I think the source of the problem is that the authors of the extended pattern file derived the modified patterns based on TUGboat's exception list, they created that "5moc1ra1t" pattern based on the word "de-moc-ra-tism" and didn't notice that adding it would cause "democrat" and "democrats" to be hyphenated incorrectly.
I guess these two words would not be the only exceptions, and there should be tens of other words that are affected by a similar problem of over-weighing the exception list.