languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.49k stars 1.4k forks source link

[en] Wrong tokenization: oughtn't #4554

Open MikeUnwalla opened 3 years ago

MikeUnwalla commented 3 years ago

image

jaumeortola commented 3 years ago

I missed it. I don't know why.

(are|is|were|was|do|does|did|have|has|had|wo|would|ca|could|sha|should|must|ai)(n['’]t) (.+)(['’]m|['’]re|['’]ll|['’]ve|['’]d|['’]s)

To add: oughtn't, mightn't, needn't... Others: e'er, o'er, ne'er, ol', 'twas, 'n' (Should we add these?)

Are there other missing contractions?

MikeUnwalla commented 3 years ago

@jaumeortola ,

e'er, o'er, ne'er, ol', 'twas, 'n' (Should we add these?)

Yes, why not? (Is there a reason not to add them?)

jaumeortola commented 3 years ago

Done. Please, check the tags I have used for these contractions: https://github.com/languagetool-org/languagetool/commit/9cd662dd73547eff8defabfaf27b7b07edd4c353

Just a minor issue. When adding words like "e'er ever RB", we have to add an exception to the synthesizer, so that "e'er" doesn't appear as a synthesized suggestion.

MikeUnwalla commented 3 years ago

@jaumeortola , the added tags are good.

I found another missing modal (mayn't): https://www.lexico.com/definition/mayn't

Tex2002ans commented 3 years ago

Here are all the other words that end in 't:

amn't = "am not" (Scottish/Irish) https://www.merriam-webster.com/dictionary/amn't https://www.lexico.com/definition/amn't

daren't = "dare not" https://www.merriam-webster.com/dictionary/daren%27t https://www.collinsdictionary.com/dictionary/english/darent

dasn't = "dare not" (Dialectical [Mostly "Northern US"?]) dassn't = "dare not" https://www.merriam-webster.com/dictionary/dasn't https://www.collinsdictionary.com/dictionary/english/dassnt

hain't = ain't (British English [Collins says "archaic" or "dialect"]) https://www.merriam-webster.com/dictionary/hain't https://www.collinsdictionary.com/dictionary/english/haint

mayn't = "may not" https://www.merriam-webster.com/dictionary/mayn't https://www.lexico.com/definition/mayn't

mightn't = "might not" https://www.merriam-webster.com/dictionary/mightn't https://www.lexico.com/definition/mightn't

mustn't = "must not" https://www.merriam-webster.com/dictionary/mustn't https://www.lexico.com/definition/mustn't

needn't = "need not" https://www.merriam-webster.com/dictionary/needn't https://www.lexico.com/definition/needn't

usedn't = "used not" (Chiefly British) https://www.merriam-webster.com/dictionary/usedn't https://www.collinsdictionary.com/dictionary/english/usednt

usen't = "used not" (Chiefly British) https://www.merriam-webster.com/dictionary/usen't https://www.collinsdictionary.com/dictionary/english/usent


These words also existed in SCOWL, but are probably too rare (I couldn't find M-W or Lexico entries):

size 70:

size 80:

size 95: