Open user202729 opened 3 years ago
The suffix cases are mostly done now. But there are cases such as (sun shine), (to-do), (goto), (some things), (school bus) or in general compound words.
Non-compound word cases:: toews (to use), morcom (more common), orloff (or love), reidt (read the) (but -T is part of plover theory too).
(TODO download wikipedia or something similar and detect those cases automatically)
Then there's the reverse case (Italian -- prefix i) or battlefield (because -L is suffix and /əɫ/ cannot match separate -L).
With the new word_boundary_conflicts
script... it would be good to be able to mark "less favorable strokes" (misstrokes, etc.)
so the compound word option is preferred.
(partially fixed: add "disambiguation_stroke" command-line argument)
Case analysis:
They can be split into two general groups: (do not handle manually anymore, make it automatic, see below)
"to feature": TPAOEFP for "feature" is quite uncommon. (this word is in Plover dictionary, but without any TO/ prefix.
TO/ in Plover main dictionary is rare, but not non-existent -- "totalitarian", (not correct -- pronounced with /oʊ/), "tonight" (compound word), "tonofilament" (???), topography, toronto, tobacco (with C consonant doubling) -- in all of those cases they do not form a phrase)
May tell the user about the brief so they can use that and avoid conflicts.
For now, the large word should always be pushed.
The user can choose to mark small word as misstroke/unused so that the large word is not pushed.
... which means that "small word should be pushed" is the special case.
So the plan is...
TOF/YI
or TO/FEE
+ disambiguation), andFEE/TUUR
maps to "feature" (can also be written FEECh
) (and also the other possible cases -- "to female, to FIFA, to Fiji"), andTO/FEE
-> toffee, orFEE/TUUR
, etc., orTO/FEE
-> toffee and never show the warning again, orTOF/YI
instead.FO/FO
-> FoFo (will be done automatically once option 1 is used)FO/LOE
-> follow, FO/RUN
-> foreign, etc., orFO/FO
-> FoFo (effectively no-op, except that the warning is never shown again), orFO/FO
+ disambiguation instead to write FoFo. (very bad, especially when there's no alternative stroke)In fact, there's no need to deprioritize the small words (FO
, AND
); however, warn the user about the possible word boundary issues -- unless it's marked as deprioritized.
Word boundary conflict is an issue. Right now AM/I produces [ami] (while the program can generate entries that use suffixes, it so far cannot filter out entries that should use them but doesn't) Similar cases: for a^ (fora), for a (foray), on to