repodiac / german_compound_splitter

Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
Creative Commons Attribution 4.0 International
22 stars 2 forks source link

index out of range in post corrections with only_nouns #1

Closed sebag90 closed 3 years ago

sebag90 commented 3 years ago

the for-loop at line 247 is based on the length of a list which gets modified during the loop. Python raised a "list index out of range" error on line 248.

FIX: (coming) add an if statement: if ri < len(results)

steps to reproduce: dictionary: GNU ASPELL german list word to process: "Pflanzenart" with only_nouns=True

repodiac commented 3 years ago

I can see your point, however I would like to reproduce (see PR comments) to check the exact behaviour for myself. If you have another compound where it gives an "index out of range" error, please provide :)

sebag90 commented 3 years ago

I'm sorry, I tried again and I notice that I made a mistake, try "pflanzenart" all lowercase, for me it breaks the code at line 248. This is not the case for other compounds such as "Autobahn"/"autobahn", "Straßenreinigung"/"straßenreinigung", "tierart"/"Tierart". Let me know if that works :)

repodiac commented 3 years ago

Got it. Thank you, I used quite some time to get to the core of the issue - your fix did prevent the Index Out Of Range error, but had rather some other side effects. It was not intended to allow for lowercase compound words, thus I used a different approach. I just changed the first letter to uppercase, thus the behaviour is rather as expected now.

Thank you for filing the issue and pointing out! I also added some other improvements for handling plural, btw. There is a new release 0.1.1now. I thus will close the PR.