languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.03k stars 1.38k forks source link

[en] replace.txt for British/American words #10770

Open jaumeortola opened 1 month ago

jaumeortola commented 1 month ago

(Related to https://github.com/languagetool-org/languagetool/issues/10721)

There is a file that suggests replacements for American -> British words in the en-GB variant: https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/replace.txt

automaker=carmaker
garbage=rubbish

But it works only partially. It works only for words that are allowed by the en-GB speller (garbage, but not automaker). image

Possible solutions:

  1. Add the American words to the British spelling dictionary.
  2. Give higher priority to the rule EN_GB_SIMPLE_REPLACE_ (prio=-50) so that it is not hidden by MORFOLOGIK_RULE_EN_GB (prio=-10).

@MikeUnwalla @AzadehSafakish @evan-defran-lt

jaumeortola commented 1 month ago

The replacement rule doesn't include inflected forms (movies -> films). We need to add them. image

MikeUnwalla commented 1 month ago

replace.txt is a crude solution because it does not account for POS. Refer to https://github.com/languagetool-org/languagetool/issues/10344.

replace.txt does not know about semantics. For example 'garbage' is correct BrE (https://www.ldoceonline.com/dictionary/garbage). We use 'rubbish' (not 'garbage') when the word refers to 'stuff to throw away'. Similar problem with 'stroller', which is a person who is stolling.

I suggest that you review the contents of replace.txt, and remove all entries that can be used in BrE (garbage, stroller, movie,...)

Your option 2 (higher priority for EN_GB_SIMPLE_REPLACE is a safer option than adding the AmE words to the BrE spelling dictionary.