user202729 / plover-generate

Generate a dictionary from a list of rules. Tailored for Plover theory.
GNU General Public License v3.0
1 stars 2 forks source link

Word boundary #1

Open user202729 opened 3 years ago

user202729 commented 3 years ago

Word boundary conflict is an issue. Right now AM/I produces [ami] (while the program can generate entries that use suffixes, it so far cannot filter out entries that should use them but doesn't) Similar cases: for a^ (fora), for a (foray), on to

user202729 commented 3 years ago

The suffix cases are mostly done now. But there are cases such as (sun shine), (to-do), (goto), (some things), (school bus) or in general compound words.

Non-compound word cases:: toews (to use), morcom (more common), orloff (or love), reidt (read the) (but -T is part of plover theory too).

(TODO download wikipedia or something similar and detect those cases automatically)

user202729 commented 3 years ago

Then there's the reverse case (Italian -- prefix i) or battlefield (because -L is suffix and /əɫ/ cannot match separate -L).

user202729 commented 3 years ago

With the new word_boundary_conflicts script... it would be good to be able to mark "less favorable strokes" (misstrokes, etc.)

so the compound word option is preferred.

(partially fixed: add "disambiguation_stroke" command-line argument)

user202729 commented 3 years ago

Case analysis:

They can be split into two general groups: (do not handle manually anymore, make it automatic, see below)

Large word should be pushed

Small word should be pushed (i.e. do nothing)

user202729 commented 3 years ago

For now, the large word should always be pushed.

The user can choose to mark small word as misstroke/unused so that the large word is not pushed.

13 will be harder in that case, however.

user202729 commented 3 years ago

... which means that "small word should be pushed" is the special case.

So the plan is...


In fact, there's no need to deprioritize the small words (FO, AND); however, warn the user about the possible word boundary issues -- unless it's marked as deprioritized.