languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.43k stars 1.4k forks source link

[en] spelling suggestions for words with hyphen #3707

Open jaumeortola opened 4 years ago

jaumeortola commented 4 years ago

We don't provide good spelling suggestions for words united by a hyphen (ex. Paris-Lonton).

This is related to word tokenization. In other languages, a hyphened word that is not in the dictionary is split in different tokens, and each word gets its own suggestions.

I'm not sure if a change in tokenization like this will affect other issues.

jaumeortola commented 3 years ago

Another example.

imatge

Should we implement this change, @udomai? Obviously, with sufficient testing.

udomai commented 3 years ago

Sounds good! Let's do it! We'll need a prohibited.txt to be able to exclude certain pairings if they are most likely to be a typo.

We'll need to keep an eye on how this impacts rules concerning hyphenated verb forms like "faisons-le", right?

jaumeortola commented 3 years ago

verb forms like "faisons-le",

My proposal is for English, not French.

In French is already done this way: <token>Paris</token><token>-</token><token>London</token> and <token>faisons</token><token>-le</token>.

jaumeortola commented 3 years ago

There are two methods to resolve this problem: