Closed sebag90 closed 3 years ago
Hi @sebag90, thanks for the issue filing and the PR! (and again: sorry for the delay, did not see/get your message(s) ... :(
Unfortunately, I cannot reproduce - I just downloaded the most recent version of german.dic (last change: 5/4/2021) and used this code:
from german_compound_splitter import comp_split
compound = 'Pflanzenart'
input_file = '/tmp/german.dic'
ahocs = comp_split.read_dictionary_from_file(input_file)
dissection = comp_split.dissect(compound, ahocs, only_nouns=True)
print('SPLIT WORDS (plain):', dissection)
print('SPLIT WORDS (post-merge):', comp_split.merge_fractions(dissection))
The output was this:
Loading data file - /tmp/german.dic
Dissect compound: Pflanzenart
SPLIT WORDS (plain): ['Pflanze', 'n', 'Art']
SPLIT WORDS (post-merge): ['Pflanze', 'n', 'Art']
Can you maybe retry or provide the german.dic
file?
I tried different things with "Pflanzenart" and variations of it. No error. When you look at the way the list results
is modified it also looks pretty difficult (not to say impossible) to run into an index out of range error, I suppose.
Again, if you can provide me with an example and maybe the precise dictionary file you used, I can try to reproduce. "Unfortunately", it (still) works for me so far...
As described under section "Issues", I will close the PR without merging it.
This should fix the problem that some words could throw an index out of range error