languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.04k stars 1.38k forks source link

Do not give a spelling warning for words that are written with IPA symbols #1621

Open MikeUnwalla opened 5 years ago

MikeUnwalla commented 5 years ago

Related suggestion: https://github.com/languagetool-org/languagetool/issues/1615

The International Phonetic Alphabet (IPA) is an academic standard: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet

LT gives spelling warnings for words that are written with the IPA. Example in English: image

Currently, writers who use the IPA must either deselect the standard spelling rule or get many false positives.

Do not give the standard 'Possible spelling mistake' message for words that use the IPA. Instead, have a rule/message that says "This word seems to use the IPA. Make sure that the word is correct".

ghost commented 5 years ago

You could add the characters do diambiguation to be ignored by spelling. And add a rule to warn for them in a word. I am not sure if such a specific exception is wise. If so, might want to add all math symbols as well.

MikeUnwalla commented 5 years ago

Hi @baarsrj , thanks for your suggestion. That would work.

Just to clarify. I did not mean to ignore a word if it contains an IPA character. I meant to ignore a word if all the characters are from the IPA. To prevent false negatives, have a minimum length of characters (say 3). Or possibly, for short words, ignore the spelling only if the word contains all IPA characters and none of the characters are in the character set for the default language.

If so, might want to add all math symbols as well. Yes. I would like an 'intelligent' spelling checker.

ghost commented 5 years ago

Might also have to find out whether IPA is only one standard. Heard about some IPA extensions as well. If implemented like you suggested, I see no objection to have this for all languages. I am however not sure if there might be a different standard for language that are not derived from Latin.

MikeUnwalla commented 5 years ago

For all languages was what I meant. Sorry for not being clear.

I assumed that there was only one IPA. Didn't know about the extensions.

ghost commented 5 years ago

Another thought: Why not ignore all strings that are completely not in the language set of the language detected? So e.g. Cyrillic words in a Latin text? Or Greek in Latin?