languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
GNU Lesser General Public License v2.1
12.02k stars 1.38k forks source link

[en] For British English, the LT spell checker does not find upper-case CENTER, but it finds lower-case #10721

Open MikeUnwalla opened 1 month ago

MikeUnwalla commented 1 month ago

Snapshot 2024-07-07. Spell checker finds COLOR and VAPOR, but not CENTER. Test sentences: 1.u) THE COLOR IS WRONG; IT'S WHITE LIKE VAPOR. 1.l) The color is wrong; it's white like vapor. 2.u) MOVE THE CONTROL STICK BACK TO THE CENTER, AND THEN MOVE IT FORWARD AGAIN. 2.l) Move the control stick back to the center, and then move it forward again. 3.u) WITH THE NOSE WHEEL AT 30 DEG. TO THE CENTER LINE, SET THE LEVER TO "UP". 3.l) With the nose wheel at 30 deg. to the center line, set the lever to "UP". 4.u) ADJUST THE CLINOMETER UNTIL THE BUBBLE IS IN THE CENTER. 4.l) Adjust the clinometer until the bubble is in the center. 5.u) ALIGN THE PISTON WITH THE CENTER OF THE SLEEVE. 5.l) Align the piston with the center of the sleeve. 6.u) SET THE CONTROLS TO THE CENTER POSITION. 6.l) Set the controls to the center position. 7.u) IF THERE ARE CRACKS IN THE CENTER PLY, REPLACE THE PANEL. 7.l) If there are cracks in the center ply, replace the panel. 8.u) REMOVE THE BOLT THAT IS FARTHEST FROM THE CENTER. 8.l) Remove the bolt that is farthest from the center. 9.u) SEND THE DEFECTIVE COVER, WITH THE OIL SAMPLES, TO THE REPAIR CENTER. 9.l) Send the defective cover, with the oil samples, to the repair center. 10.u) WHEN YOU LIFT THE AIRCRAFT ON JACKS, KEEP THE CENTER OF GRAVITY BETWEEN THESE LIMITS: 10.l) When you lift the aircraft on jacks, keep the center of gravity between these limits: 11.u) THE MODIFICATIONS CAN CHANGE THE CENTER OF GRAVITY COORDINATES. 11.l) The modifications can change the center of gravity coordinates. 12.u) THE ROD MUST TOUCH THE CENTER OF THE STRIP. 12.l) The rod must touch the center of the strip. 13.u) THE CENTER OF GRAVITY MOVES IN RELATION TO THE LOADS ON THE WINGS. 13.l) The center of gravity moves in relation to the loads on the wings. 14.u) THE TOTAL QUANTITY OF FUEL IN THE CENTER TANK IS 5000 LB. 14.l) The total quantity of fuel in the center tank is 5000 lb. 15.u) MAKE SURE THAT THE RATE OF MOVEMENT OF FUEL FROM THE WING TANKS TO THE CENTER TANK IS EQUAL. 15.l) Make sure that the rate of movement of fuel from the wing tanks to the center tank is equal. 16.u) MAKE SURE THAT THE RATE OF SUPPLY OF FUEL FROM THE WING TANKS TO THE CENTER TANK IS EQUAL. 16.l) Make sure that the rate of supply of fuel from the wing tanks to the center tank is equal.

Same results on


jaumeortola commented 1 month ago

Hello @MikeUnwalla We are preparing new English dictionaries, and this issue is already solved with the new dictionaries.

I am about to make the new dictionaries available in this branch, but not merging yet. We want to make more tests: I would appreciate it if you could test it, specially in British English.


MikeUnwalla commented 1 month ago

@jaumeortola, I am happy to do some tests, but I do not know what I must do.

Do I make a new local branch for, and then use Maven to make a GUI? (And then do the tests.)

jaumeortola commented 1 month ago

The branch is ready now.

You can do this:

To go back to the master branch:

The compiled version of LT will be in: languagetool/languagetool-standalone/target/LanguageTool-6.5-SNAPSHOT/LanguageTool-6.5-SNAPSHOT or (compressed) in: languagetool/languagetool-standalone/target/

If this doesn't work, I can send you the ZIP file (236M).

MikeUnwalla commented 1 month ago

@jaumeortola, I do not use git. I have GitHub desktop.

Can you put the zip file on a website and tell me the URL? Then, I will download it. Thanks.

jaumeortola commented 1 month ago

@MikeUnwalla I have sent a link to your public email address in GitHub.

MikeUnwalla commented 1 month ago

@jaumeortola, in addition to false positives for BrE spelling, there are missing postags. I suggest that I make a pull request for those missing postags. As part of the pull request, do you want me to add BrE false positive words to spelling_en-GB.txt?

MikeUnwalla commented 1 month ago

@jaumeortola, Comments about LT branch with new English dictionaries. MORFOLOGIK_RULE_EN_GB, REMOVED only, A to C only. Sorry, now I don't have time to do more.

I did not check the dictionaries for all the removed spellings, only the ones that I was not fully sure about.

ac: probably an error. Ac=abbreviation for actinium. AC=abbreviation for alternating current. 'ac' has no POS Aisne-Marne: missing POS analyze, analyzed, analyzers, analyzes, analyzing; Not correct BrE, 'analyze' is AmE. In BrE, we use analyse always. (,, automaker, automakers., AmE aux: Correct BrE, but missing POS, was JJ, backcountry: probably AmE only baddest. Not standard BrE, [15]. Make a style rule. bads: probably only AmE. Not sure that this is a noun in standard BrE Balrogs eat: missing POS for Balrogs if it is a proper noun beachfront: missing POS NN, which is in current LT. Beatlemania: possible incorrect POS. Probably should be NN:U, not NNP blowed: not standard BrE. Only in idiom 'be blowed if' breathalyzed, breathalyzer: AmE only.,, Also capitalized, is a trademark. Possibly add NNP brights: AmE,, burglarize, burglarized, burglarizing: AmE, for BrE use burgle:, bursted: not BrE, busing. Correct BrE, but missing POS VBG is in current LT buss: Not BrE, missing POS NN is in current LT. Carian. Missing NNP is in current LT catalyze, catalyzed, catalyzer: AmE, centimeter, centimeters: AmE only checkered: AmE only?,, Cherubims. Not a standard word in AmE or BrE. Remove NNS. The plural of cherub is cherubim., chiefest: Not sure that the adjective is gradable in BrE. No reference. com: No POS. Probably not correct in BrE as a stand-alone word. couldst. Archaic in BrE and AmE,, countertop, countertops: Mainly AmE,, cruelest: Not BrE, curst, Curst: archaic in BrE for 'cursed' . Possibly not NNP

MikeUnwalla commented 1 month ago

@jaumeortola, fFalse positives and missing POS for words in branch with new BrE spelling. (From my data, not from the regression results.)

VERBS: autolyse autolysed autolyses [No spelling error, but missing VBZ] autolysing blub blubbed blubs blubbing dialyse dialysed dialyses [No spelling error, but missing VBZ] dialysing hydrolyse hydrolysed hydrolyses [No spelling error, but missing VBZ] hydrolysing photolyse photolysed photolyses [No spelling error, but missing VBZ] photolysing plasmolyse plasmolysed plasmolyses [No spelling error, but missing VBZ] plasmolysing sulphate sulphated sulphates sulphating

NOUNS: anatomiser anatomisers annexe artefact artefacts callipers canoniser canonisers colonisationist colonisationists demythologiser demythologisers depolariser depolarisers electrolyser electrolysers epitomiser epitomisers euphemiser euphemisers familiariser familiarisers finaliser finalisers flipchart flipcharts hydrolyser hydrolysers phoney phoneys privatiser privatisers cognisance dialysability dialysation electrolysation haemophilia hydrolysation journalling septicaemia sulphonation

callisthenics [spelling only, POS is OK]

ADJECTIVES: duff [No spelling error, but missing JJ] duffer [No spelling error, but missing JJR] duffest photolysable

INTERJECTION: wakey-wakey

MikeUnwalla commented 1 month ago

BRE MISSING POS AND SPELLING ERRORS:, variant of guesstimate guestimate guestimates guestimated guestimating recognisance [not sure whether this is NN:U or NN:UN. recognizance is NN:U only] recognisances

BRE AND AME MISSING POS AND SPELLING ERRORS:, crystallizer crystallizers ropy

jaumeortola commented 1 month ago

Thank you, @MikeUnwalla I have been adding most of your suggestions to the dictionary. I will post here only the pending cases. Sometimes I don't know what is the best solution: a change in the dictionaries or a new rule...

MikeUnwalla commented 1 month ago


For 'ac' and 'com', I didn't see a a definition in the British dictionaries. Probably AmE only.

baddest: yes, a style rule would be good.

bad/bads: a style/semantics rule would be good.

blowed: a style rule would be good.

chiefest: OK. I didn't look on Wiktionary.

For archaic terms, the LT team must have general principle. Either all archaic terms must be in the dictionary (and have POS), or all archaic terms must not be in the dictionary.

MikeUnwalla commented 1 month ago

And one more missing POS that I just found (AmE and BrE): left/RB