languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.2k stars 1.38k forks source link

[de] wrong suggestion for "Dampfschiffahrtskapitän" #1369

Closed tiff closed 5 years ago

tiff commented 5 years ago

Not really common but I just wanted to show the capabilities of LanguageTool to someone and demoed it by using this word...

Dampfschiffahrtskapitän

Correct word would be "Dampfschifffahrtskapitän"

bildschirmfoto 2019-01-20 um 19 41 46

janschreiber commented 5 years ago

Also see #725. Daniel improved the suggestion mechanism in summer 2017 (in response to my request back then), and this was a huge step forward. The gist of the solution was to take the suggestions for compounds from a static, finite but large list of words that "make sense" to humans. It worked out well. But if the most likely suggestion is not in the list ("Dampfschifffahrtskapitän" with three f in this case), some algorithm is used that builds the compounds for suggestions on-the fly and fails miserably most of the time. For unknown but correct or almost correct words, we often suggest utter nonsense. For example, I got the suggestions "Aluminiumwitwenkabel, Aluminiumkatzenkabel" for "Aluminiumlitzenkabel" today. These suggestions are of course valid compounds, but what do widows have to do with aluminum cables? They look semantically weird.

danielnaber commented 5 years ago

I did some analysis: Dampf, schiff, ahrts, kapitän is one of several splits, but ahrts doesn't get the suggestion fahrts (in CompoundAwareHunspellRule#getCandidates()), as we use the standard suggestion algorithm, and that's not prepared to work well with in-compound words with the infix-s. Maybe we can find a hack to improve this.

danielnaber commented 5 years ago

Fixed with a rather specific hack (but not just hard-coding this word).