koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

Remove too generic hyphenation rule for Russian "э" vowel #564

Closed dmalinovsky closed 4 months ago

dmalinovsky commented 4 months ago

With this rule the engine makes hyphenations in unexpected places:

With the rule Without it

It also makes hyphens in the words like this: "фл-эшка" ("fl-ash drive"), "Бл-эк" ("Bl-ack"), "Гр-эй" ("Gr-ay").

@hius07, can you please take a look or recommend another reviewer?


This change is Reviewable

hius07 commented 4 months ago

Are there examples of usual words affected by the pattern?

dmalinovsky commented 4 months ago

I didn’t find a way to run the hyphenation engine manually, but I imagine it may affect the words like “поэзия” or “маэстро”: https://gramota.ru/biblioteka/spravochniki/pravila-russkoj-orfografii-i-punktuacii/bukva-e

poire-z commented 4 months ago

Feels odd you need to update such a generic pattern - and that Russian readers/writters (using KOReader or Libreoffice or other softwares) didn't feel the need to change that for 20 years :) Would be goot to check how it is in the current free hyph dicts used by such projects. Some links in #373. And for other russian readers to use these modified .pattern files on their current reading for some time.

(Usually, at least for French, we have just added "longer" patterns for "strange" words these last years.) Also, for quick checking individual words: https://www.ushuaia.pl/hyphen/

dmalinovsky commented 4 months ago

This pattern negatively affects mostly foreign names, which are not in the dictionaries.

Thanks for the suggestion, @poire-z, I’ll work on adding longer patterns instead of this PR.