openstenoproject / plover

Open source stenotype engine
http://opensteno.org/plover
GNU General Public License v2.0
2.35k stars 279 forks source link

Implicit hyphen keys causes dictionary issues when there are multiple keys with the same letter #1086

Open lambdadog opened 5 years ago

lambdadog commented 5 years ago

Summary

When the letters in IMPLICIT_HYPHEN_KEYS are repeated more than once in a steno layout (even with the hyphen on different sides of them) plover treats both keys as implicit hyphen keys when looking up dictionary entries (but not for output to paper tape)

Reproducing

With the code

KEYS: Tuple[str] = (
    '#',
    'S-', 'T-', 'P-', '+-', 'C-', 'K-', 'V-', 'R-', 'L-', 'N-',
    'I-', 'E-',
    '-A', '-O',
    '-S', '-T', '-P', '-.', '-C', '-K', '-V', '-N', '-I', '-E',
    # Plover is not pleased with '*' here instead of '-*' for some reason
    '-*'
)

IMPLICIT_HYPHEN_KEYS: Tuple[str] = ('I-', 'E-', '-A', '-O')

(and additional code as necessary to define a system) -CI is indeed output as -CI, but it translates to the entry at CI, despite the IMPLICIT_HYPHEN_KEYS tuple containing I- and not -I

This is also shown in the "Add dictionary entry" dialog as follows:

ex

ex2

Plover Version

Plover 4.0.0.dev8

System

Linux, running NixOS, using KDE Plasma(kwin), running from an AppImage

lambdadog commented 5 years ago

For a working system that showcases this, see plover_czech at revision 1a2cfba11a5d1de5b8995f958d0047b3ec952008

benoit-pierre commented 5 years ago

That's because your system definition is invalid: I- can't be an implicit hyphen key because of -I, same with E-/-E. With your system definition, the implicit hyphen keys would be: ['-A', '-O'].

Here some code have been using to programmatically determine the set of implicit hyphen keys:


KEYS = (
    '#',
    'S-', 'T-', 'P-', '+-', 'C-', 'K-', 'V-', 'R-', 'L-', 'N-',
    'I-', 'E-',
    '-A', '-O',
    '-S', '-T', '-P', '-.', '-C', '-K', '-V', '-N', '-I', '-E',
    # Plover is not pleased with '*' here instead of '-*' for some reason
    '-*'
)

KEY_FIRST_RIGHT_INDEX = None
letters_left = {}
letters_right = {}
for n, k in enumerate(KEYS):
    assert len(k) <= 2
    if 1 == len(k):
        assert '-' != k
        l = k
        is_left = False
        is_right = False
    elif 2 == len(k):
        is_left = '-' == k[1]
        is_right = '-' == k[0]
        assert is_left != is_right
        l = k.replace('-', '')
    if KEY_FIRST_RIGHT_INDEX is None:
        if not is_right:
            assert k not in letters_left
            letters_left[l] = k
            continue
        KEY_FIRST_RIGHT_INDEX = n
    # Invalid: ['-R', '-L']
    assert not is_left
    # Invalid: ['-R', '-R']
    assert k not in letters_right
    # Invalid: ['#', '-R', '#']
    assert is_right or l not in letters_left
    letters_right[l] = k
# Find implicit hyphen keys/letters.
implicit_hyphen_letters = {}
for k in reversed(KEYS[:KEY_FIRST_RIGHT_INDEX]):
    l = k.replace('-', '')
    if l in letters_right:
        break
    implicit_hyphen_letters[l] = k
for k in KEYS[KEY_FIRST_RIGHT_INDEX:]:
    l = k.replace('-', '')
    if l in letters_left:
        break
    implicit_hyphen_letters[l] = k
print(set(implicit_hyphen_letters.values()))
lambdadog commented 5 years ago

@benoit-pierre so it's impossible for I- and E- to implicitly hyphenate? IE-ST is pretty ugly and unconventional, and goes against how the system would traditionally be written, I believe, and with what I've currently got it, the paper feed seems to work, why can't the same code used for printing the output to paper feed be used for looking up in the dictionary?

I- and E- are distinct from -I and -E and I can't see anywhere for overlap that would cause issues

I vs -I,
CE vs -CE vs C-E,
etc.

slampisko commented 5 years ago

This is currently a blocker for implementing and using Czech stenotype.