user202729 / plover-generate

Generate a dictionary from a list of rules. Tailored for Plover theory.
GNU General Public License v3.0
1 stars 2 forks source link

Plural form inconsistency #3

Open user202729 opened 3 years ago

user202729 commented 3 years ago

While separate suffix strokes is not a problem,

user202729 commented 3 years ago
  1. Perhaps that issue isn't very common, because briefs use uncommon sounds and plover have automatic suffix folding. However it is a problem in some cases (member -> memes, thinks -> this, bringing -> brig, minute -> mince)

    Most of the time orthographic rules can handle the suffix (hand/*L, continue/ous), but sometimes it can't (rid/le, second/ry, element/ly).

    Some other times it conflicts with some other words (help → helper, hepper)

  2. For some reason "briefed" and "bereaved" are both not in the frequency list. Sorting by stem frequency as a second criteria works. (implemented)

    Extension: fix/fiction, fixes/fictions.

user202729 commented 3 years ago

Or sometimes the (/i/ - y -> EU) is applied, but (/i/ - i -> AOE) is used for the plural form, which is inconsistent. (gypsy, gypsies)

[ie] -> [EU] is not good, for cases like griff/grief.

There are words in the lemmatization file but not in the dictionary file (griefs), but then they can simply be added to the dictionary.

user202729 commented 3 years ago

With the new disambiguation feature, it may become worse. (WRAOEUT -> wright, WRAOEUGT -> wrighting) (currently out-of-order suffix is only supported by plover's combining suffix keys)

Or KAR -> car, KAR/AES -> carr's. (with KARZ being car's)

user202729 commented 3 years ago

Similarly, with the new full-briefs (completely-compatible) dictionary added, currently *EPLT maps to "element", but *EPLTS maps to "empts". (fixed)

user202729 commented 3 years ago

Plover's current behavior prefers non-suffix to suffix, then maximum-matching.

Which means if (A, B, A/B) are in the dictionary, then A/B-S translates to (A/B) + -S; however if B-S is also present then it translates to A + (B-S).

This behavior is supposedly not very desirable. Besides, the current n-gram handler should be able to process them.