user202729 / plover-python-dictionary-lib

Library for writing Python dictionary for Plover, and generating JSON dictionary file from Python dictionary.
GNU General Public License v3.0
2 stars 1 forks source link

filtering seems to not work? #3

Closed Josiah-tan closed 2 years ago

Josiah-tan commented 2 years ago

version 0.4.2 try running this code:

from plover_python_dictionary_lib import get_context_from_system
from plover.system import english_stenotype as e
context = get_context_from_system(e)
s = context.SingleDictionary
stroke = context.stroke
translation = context.translation

# initializing the dictionary
dictionary = s({"S-": "10", "K-": "100"})
dictionary.print_items()
# should remove the K?
dictionary = dictionary.filter(lambda x: len(x) <= 2)
dictionary.print_items()
# adding the K back in (doesn't work)
dictionary *= stroke("K-")
dictionary.print_items()

I get an assertion error like so, even though I think the "K" should be filtered out in the previous step?

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/Josiah/plover/.tox/dev/lib/python3.8/site-packages/plover_python_dictionary_lib/__init__.py", line 126, in __mul__
    return ProductDictionary(self.stroke_type, self, other, merge=True)
  File "/home/Josiah/plover/.tox/dev/lib/python3.8/site-packages/plover_python_dictionary_lib/__init__.py", line 357, in __init__
    assert not (x&y), f"Cannot merge -- overlapping mask: {x} & {y}"
AssertionError: Cannot merge -- overlapping mask: SK & K
user202729 commented 2 years ago

The checker is "conservative", in the sense that Cartesian product prohibit overlapping mask because it might lead to ambiguous result.

Also, implementation-detail-wise it kind of makes sense, as the whole dictionary is not generated in-memory and the translation is done on-demand i.e. when it gets the stroke SK, it e.g. deterministically lookups "S" in the first half and lookup "K" in the second half. So conflicting means exponentially multiple required lookups.

Workaround would be to explicitly convert the filtered dictionary into a an explicit form, I guess. Consequence would be that it takes the amount of memory proportional to the total number of dictionary entries.

Something like this.

dictionary = s(dict(dictionary.filter(lambda x: len(x) <= 2).items()))

Maybe the documentation needs to be improved a bit, I forgot already how to iterate over items (need to call .items() on it, cannot just dict(...))


Alternatively maybe only remove the lookup() function for such conflicting dictionaries? items() method obviously can still work in this case.

Josiah-tan commented 2 years ago

I have tried the hack with the explicit form, and it seems to work thanks! Maybe a smarter filter that takes this into account can be created in the future somehow, but I guess that is for future work