marcoagpinto / aoo-mozilla-en-dict

English Dictionaries Project (AOO+Mozilla+others)
163 stars 24 forks source link

Errors in FLAGS "3", "5" and "N" #30

Closed marcoagpinto closed 5 years ago

marcoagpinto commented 5 years ago

@Ding-adong

Two flags have been fixed this last weekend: 2019-02-01 — Improved flag "5" thanks to the GitHub user Ding-adong: Some "swomen's" and "women's" entries were missing. — Fixed flag "3": ists, ists, ist's → ist, ists, ist's

There is an issue also with flag "N" while converting some words to -ZATION/-SATION.

The issue is in the -ise forms.

I know I am a lazy arse, but I would like to ask if you could take a look at the flag and try to spot the error (legacy).

If you are unable to find it, I will try to do it myself during the week.

Thanks!

Ding-adong commented 5 years ago

I can't find anything untoward. N and n do repeat sation when q does the job so stick with q for more words. N and n sometimes does the same but n tend to have an extra word. I use n for ation tion ication. N is the lesser and can't see the point of it at the moment.

marcoagpinto commented 5 years ago

@Ding-adong

This is what I mean: ize_20190108

-ise_20190108

Should I change the rule number for -ise to become similar to -ize, or will it create a bug?

I haven't done yet an assessment of the risk.

Thanks!

marcoagpinto commented 5 years ago

EDIT: replace it with: SFX n e ation [^ckt]e

(from the flag "n") ?

marcoagpinto commented 5 years ago

EDIT: And maybe add a "z" to the rule?: SFX N e ation [^cktz]e

marcoagpinto commented 5 years ago

Here are the results of the tests: BEFORE CHANGING: Total wordlist: 216 741 Duplicates: 8 261

AFTER CHANGING: Total wordlist: 216 896 Duplicates: 8 246

The duplicates reduce but the number of words increase.

marcoagpinto commented 5 years ago

Hey hey hey!

I believe to have fixed it: SFX N e ation [^o]se

This is how I should change it.

I am now making a DIFF with Tortoise SVN.

marcoagpinto commented 5 years ago

Well, the previous change creates wrong words after checking the DIFF.

:-(

Maybe I should not change it.

Ding-adong commented 5 years ago

I see your problem. I don't use ize or zation, zable etc.

In the past i remember this issue. optimise/q failed and had to use n. However since optimization/M was there i changed the z to s and didn't bother adding n to optimise. Simply don't use N with optimise.

However the sation rightfully belongs to q rather than wasting time searching other letters to get the correct spelling, when i was a newbie.

I moved SFX q e ation [^ckt]e and SFX q e ations [^ckt]e from n and blocked SFX q e isation [^l]e plus plural.

It worked.

Testing for any effects on other words. Let you know later.

Ding-adong commented 5 years ago

What does the position column of the wordlist means?

marcoagpinto commented 5 years ago

What does the position column of the wordlist means?

In PTG?

It means the position where the flags are found in the .aff below (optimised .aff while editing the .dic).

Double-click in the word and it will jump to that position (Windows only so far I believe, since if I well remember, the Linux API didn't work like planned).

Ding-adong commented 5 years ago

I have created a fork and uploaded 3 files. Do download and check them. Check new aff file and compare with yours and you will see the changes I had to make to fix your problems. Thus q will be the main letter for -sation and - for zation. No need to go to n or N thus no messing around and duplicating words. Also added a line for apostrophe as too many sation/M zation/M. If you like it then expand on it.

I have added a plural line to N as too many words have n and N creating duplicates. If you like it then more plurals should be added to N from n and thus be less reliant on n.

marcoagpinto commented 5 years ago

I am doing it the hard way, changing FLAG "N".

So far I have managed to produce an identical wordlist using: SFX N Y 30 SFX N b ption b SFX N d sion d SFX N be ption be SFX N e tion ce SFX N de sion de SFX N ke cation ke SFX N e ption ume SFX N e mation [^u]me

SFX N e ion cise SFX N e ion [tfl]use SFX N e ion [rsn]se SFX N e ion ulse SFX N e ion [vl]ise

SFX N e ition ose SFX N e ation [iou]te SFX N e ion [^iou]te SFX N e ation [^bcdkmst]e SFX N el ulsion el SFX N 0 lation [aiou]l SFX N 0 ation [^aeiou]l SFX N 0 mation [aeiou]m SFX N 0 ation [^aeiou]m SFX N er ration er SFX N 0 ation [^e]r SFX N 0 ion [sx] SFX N t ssion mit SFX N 0 ion [^m]it SFX N 0 ation [^i]t SFX N y ication y SFX N 0 ation [^bdelmrstxy]

Tomorrow or Friday I must implement/fix the "sation", I must go slowly to avoid bugs.

marcoagpinto commented 5 years ago

Fixed by adding: SFX N e ation [^uriolsna][^bcdkmztrlvgu]e

I checked the before and after wordlist with SHA-512, and they matched.

Ding-adong commented 5 years ago

I am looking at the N and n flags. Making a list of words used and see where the duplicates are. Also looking at D and G when d does both D and G. Did you check my aff file? What do you think of the apostrophe idea?

No bugs yet. I used this https://www.morewords.com/ends-with/ize/ One can change ize into anything. Just copy all the words in subtitle edit or notepad++ and use regex to convert... then compare the two files ize and ise.

I extracted a wordlist using PTG with your latest dic and original aff. Replace aff, convert ize to ise, then use winmerge to compare. Check until a spelling error. SFX q e ation [^ckt]e becomes SFX q e ation [^cktv]e then repeat then error then SFX q e ation [^cktvm]e and so on adding r l n d until error free. Well i hope so.

marcoagpinto commented 5 years ago

Buaaaaaa!!!!!!!

"optimise" still doesn't work with the flag "N".

I have been trying to fix it.

marcoagpinto commented 5 years ago

Ahhhhh...

I seem to have fixed it: SFX N e ation [^uriolsna][^bcdkmztrlvgu]e SFX N e ation [^crbav][mi][^bcdkmztrlvgun]e

Ding-adong commented 5 years ago

Buaaaaaa!!!!!!!

"optimise" still doesn't work with the flag "N".

I have been trying to fix it.

To become what? optimisation?

I have already given you the solution by using q. Forget about N n for words ending in ize, ise.

marcoagpinto commented 5 years ago

Buaaaaaa!!!!!!! "optimise" still doesn't work with the flag "N". I have been trying to fix it.

To become what? optimisation?

I have already given you the solution by using q. Forget about N n for words ending in ize, ise.

I have fixed it already.

I am trying to keep the rules close to their creator.

optimisation=verb

So, "N" is needed, and I have fixed it.

Ding-adong commented 5 years ago

Jesus wept. The creator? where? Do it better by not holding onto the past. I just went through N bit by bit and spotted more errors/duplications and unnecessary work. Forget the verb/noun so and so. It easier to see words and spell it by grouping them under relevant flags. optimise - see ise -go to q where ise are. Messing around looking elsewhere is inefficient.

Ding-adong commented 5 years ago

SFX N Y 60 absorb SFX N b ption b absorption absorb SFX N b ptions b absorptions apprehend SFX N d sion d apprehension apprehend SFX N d sions d apprehensions inscribe SFX N be ption be inscription inscribe SFX N be ptions be inscriptions adduce SFX N e tion ce adduction adduce SFX N e tions ce adductions abrade SFX N de sion de abrasion abrade SFX N de sions de abrasions evoke SFX N ke cation ke evocation evoke SFX N ke cations ke evocations assume SFX N e ption ume assumption assume SFX N e ptions ume assumptions inflame SFX N e mation [^u]me inflammation inflame SFX N e mations [^u]me inflammations immerse SFX N e ion [^o]se immersion immerse SFX N e ions [^o]se immersions dispose SFX N e ition ose disposition dispose SFX N e itions ose dispositions discrete SFX N e ation [iou]te discretion discrete SFX N e ations [iou]te discretions abbreviate SFX N e ion [^iou]te abbreviation abbreviate SFX N e ions [^iou]te abbreviations abjure SFX N e ation [^bcdkmst]e abjuration abjure SFX N e ations [^bcdkmst]e abjurations compel SFX N el ulsion el compulsion compel SFX N el ulsions el compulsions distil SFX N 0 lation [aiou]l distillation uk spelling distil SFX N 0 lations [aiou]l distillations uk spelling distill SFX N 0 ation [^aeiou]l distillation usa spelling distill SFX N 0 ations [^aeiou]l distillations usa spelling sum SFX N 0 mation [aeiou]m summation sum SFX N 0 mations [aeiou]m summations affirm SFX N 0 ation [^aeiou]m affirmation affirm SFX N 0 ations [^aeiou]m affirmations arbiter SFX N er ration er arbitration remove N and put it under arbitrate arbiter SFX N er rations ers arbitrations remove N and put it under arbitrate colour SFX N 0 ation [^e]r colouration colour SFX N 0 ations [^e]r colourations complex SFX N 0 ion [sx] complexion complex SFX N 0 ions [sx] complexions admit SFX N t ssion mit admission admit SFX N t ssions mit admissions audit SFX N 0 ion [^m]it audition audit SFX N 0 ions [^m]it auditions attest SFX N 0 ation [^i]t attestation attest SFX N 0 ations [^i]t attestations apply SFX N y ication y application apply SFX N y ications y applications ossify SFX N y ication [^p]y ossification ossify SFX N y ications [^p]y ossifications assign SFX N 0 ation [^bdelmrstxy] assignation assign SFX N 0 ations [^bdelmrstxy] assignations quota SFX N 0 tion a quotation quota SFX N 0 tions a quotations occupy SFX N y ation py occupation occupy SFX N y ations py occupations origin SFX N 0 ation [^aelry] origination origin SFX N 0 ations [^aelry] originations

SFX n all gone into N. SFX X all gone into N.

In your dictionary you have words that are plural when they shouldn't be. Words that are both singular and plural. Etc accommodation GB. USA use accommodation(s). It can not go into N for GB spelling. Can I suggest using either n or X for words that are both singular and plural. This would avoid creating false plural, efficient and prevents duplication.