marcoagpinto / aoo-mozilla-en-dict

English Dictionaries Project (AOO+Mozilla+others)
159 stars 24 forks source link

Proofing Tool #36

Closed Ding-adong closed 5 years ago

Ding-adong commented 5 years ago

When I save, I want to save dic only and not AFF. I update Aff through notepad++ and saving AFF overwrites my new work. Now I have to tell notepad++ not to reload and then save. Can you separate saving of dic and Aff?

marcoagpinto commented 5 years ago

Can you separate saving of dic and Aff?

No, sorry... it is just like saving a thesaurus, it saves the .dat plus the .idx .

You must change the .aff before loading or change in the PTG editor.

Regarding the "L" and "I" issue, I tried to use Verdana, but on Ubuntu it automatically used a compatible font that produced the results it was supposed to fix and the letters became too small (first they became too big and I changed the size, too small). So, back to Arial.

Ding-adong commented 5 years ago

Use another font. When edit a word Aff aid pulldown t (uk s) and + (usa z) desc is wrong.

marcoagpinto commented 5 years ago

When edit a word Aff aid pulldown t (uk s) and + (usa z) desc is wrong.

Fixed!

marcoagpinto commented 5 years ago

@Ding-adong

I have just released build 141 of PTG.

It fixes the decoding of twofold when we use suffixes+prefixes+suffixes.

Force a browser refresh if the site still shows 140.

Ding-adong commented 5 years ago

Can you explain what the above means? I think there is a bug but not sure if you know. I'd compared two wordlists and suddenly words are missing and I've never touched those words. When I looked into it using PT, click on edit a word, the prefix has changed itself. List of words looks like this. word suf word suf word suf word pref word suf - this word will not show up in the wordlist. Normally word pref always after all word suf. Also I was using notepad++, doing some work and suddenly the reload PT saved file button comes up out of the blue. It looks like PT is doing something when I am not using it.

marcoagpinto commented 5 years ago

Can you explain what the above means? I think there is a bug but not sure if you know. I'd compared two wordlists and suddenly words are missing and I've never touched those words. When I looked into it using PT, click on edit a word, the prefix has changed itself. List of words looks like this. word suf word suf word suf word pref word suf - this word will not show up in the wordlist. Normally word pref always after all word suf. Also I was using notepad++, doing some work and suddenly the reload PT saved file button comes up out of the blue. It looks like PT is doing something when I am not using it.

@Ding-adong I fixed the issue on twofold which I found when you suggested that "ness's" and "nesses" rule: word/sufs+prefs+sufs

Try your bug report with: test/GDUS

Of course, it shows all suffixes and then applies the prefixes to them.

This is how Hunspell works: if you have a set of suffixes in a word, every prefix is applied to ALL suffixes in the word.

So, in the ListIconGadget I first show all suffixes and then prefixes applied to them.

PTG is not doing anything when not in use. It loops the event checker which is the normal behaviour.

You can view the source-code which I supplied in the site if you have doubts.

EDIT: After implementing the twofold fix, I tested with three spellers (GB+PT+HK) before and after the fix and the SHA-512 matched, so, no words were lost.

Ding-adong commented 5 years ago
I fixed the issue on twofold which I found when you suggested that "ness's" and "nesses" rule:
word/sufs+prefs+sufs

Try your bug report with:
test/GDUS

Of course, it shows all suffixes and then applies the prefixes to them.

That I understand. Yesterday it was:

word suf
word suf
word suf
word pref
word suf - this word will not show up in the wordlist.

Are you saying you have fixed the above? What is ListIconGadget ?

PTG is not doing anything when not in use. Are you sure? Why does notepad++ say 'reload file yes/no' out of the blue. It can only do this if PT instruct notepad++ to do so.

marcoagpinto commented 5 years ago

Are you saying you have fixed the above? What is ListIconGadget ?

Well, here are the results using your twofold rule: (before fix - PTG build 140) before

(after fix - PTG build 141) after

Do you understand now?

PTG is not doing anything when not in use. Are you sure? Why does notepad++ say 'reload file yes/no' out of the blue. It can only do this if PT instruct notepad++ to do so.

Well, I am a Notepad++ user too and a few weeks ago a similar thing happened to me while I had a text file opened in it. Maybe it is a Notepad++ bug?

Ding-adong commented 5 years ago

Yes that is what I was trying to tell you. Thxs for the fix. eg. locate/ASGFEnD - dislocated is missing from wordlist. locate/ASGFEDn - dislocated is shown. "location","locate","Suffix","n","5: e ion/S [^iou]te" - is the bug because what comes after n due to ion/S ?

Ding-adong commented 5 years ago

When you are no too busy, another future fix. Extract duplicate file. List of words eg. abbreviation:2:31,493+31,495 is hard to read. Can it be like: abbreviation:2 :31,493 + 31,495 that's why space was invented to make reading easier ha ha.

marcoagpinto commented 5 years ago

Yes that is what I was trying to tell you. Thxs for the fix. eg. locate/ASGFEnD - dislocated is missing from wordlist. locate/ASGFEDn - dislocated is shown. "location","locate","Suffix","n","5: e ion/S [^iou]te" - is the bug because what comes after n due to ion/S ?

The bug happened because twofold calls the decoding function from inside itself (recursivity) which can only happen once (twofold), but I forgot to place a condition that would only decode the prefixes if in the normal layer (it is hard to explain), in simple words, it was decoding the prefixes when it reached the bottom of the function without checking if it was using recursivity or not.

Now I only decode prefixes if AND RECURSIVITY=#FALSE which means that only after the function returns from recursivity setting the flag to #False is when they are decoded.

It was a simple fix but it took years because no one ever reported it.

marcoagpinto commented 5 years ago

When you are no too busy, another future fix. Extract duplicate file. List of words eg. abbreviation:2:31,493+31,495 is hard to read. Can it be like: abbreviation:2 :31,493 + 31,495 that's why space was invented to make reading easier ha ha.

Yes, in the next release :-)

And I need to code the damn "duplicates merge/delete in .dic".

I am a lazy arse, I know, but last week I spent a few days writing a report for my job (no one asked for it, I wrote the report just to help).

Ding-adong commented 5 years ago

When exporting wordlist in csv can you export the number 1 to 9 as 01 to 09 so it can sort properly in order numerical order. cheers

marcoagpinto commented 5 years ago

When exporting wordlist in csv can you export the number 1 to 9 as 01 to 09 so it can sort properly in order numerical order. cheers

Do you mean the rule number?

If so, should I export it between 001 and 009 for rules up to 999?

Ding-adong commented 5 years ago

F column eg. "5: 0 ous [cemrstu]in" to "05: 0 ous [cemrstu]in"

Calm down, highest number of rules is about 55. Each PFX SFX starts at 1.

marcoagpinto commented 5 years ago

F column eg. "5: 0 ous [cemrstu]in" to "05: 0 ous [cemrstu]in"

Calm down, highest number of rules is about 55. Each PFX SFX starts at 1.

I will update PTG around the end of next week.

It is in my to-do list.

marcoagpinto commented 5 years ago

@Ding-adong I have just released PTG 3.0 - build 142 with the features you requested: — Fix: Improved the Preferences window; — "Show duplicates wordlist" now clearer export to read and exports in UNIX format with BOM; — Export wordlist now exports in UNIX format with BOM; — Extracting as CSV the rules number have zeros before to make sorting easier in Calc/Excel; — Added to dictionary editor pop-up menu: 1- "Copy all words" 2 - "Copy all words & rules" — Continued coding "Show/Merge/Delete duplicates .dic" ("Process" button still not coded).

Ding-adong commented 5 years ago

Nice one.

Ding-adong commented 5 years ago

Did you get my email?

marcoagpinto commented 5 years ago

Did you get my email?

What e-mail?

Ding-adong commented 5 years ago

I think replying to email from github doesn't work.

https://github.com/Ding-adong/aoo-mozilla-en-dict/blob/master/dicaff.zip

Ding-adong commented 5 years ago

I think I have found the bug. When I save then go to notepad++ and the reload box appears, PT green saving bar is halfway and stops for 1-2 seconds. I clicked on reload then go to PT and the green bar finishes, then back to Notepad++ and reload again. Now I do it again and watch notepad++ and wait until green bar finishes. The reload box sorts of flicker then I reload it and only happens once. It looks like PT is saving wordlist twice.

marcoagpinto commented 5 years ago

I think I have found the bug. When I save then go to notepad++ and the reload box appears, PT green saving bar is halfway and stops for 1-2 seconds. I clicked on reload then go to PT and the green bar finishes, then back to Notepad++ and reload again. Now I do it again and watch notepad++ and wait until green bar finishes. The reload box sorts of flicker then I reload it and only happens once. It looks like PT is saving wordlist twice.

It saves twice, yes, the .aff plus the .dic .

The .aff is saved in one time. I get the text from the EditorGadget, convert it to UNIX and save it.

And the .dic is saved line by line not all at the same time. It gets each word data from the array and saves one by one.

Ding-adong commented 5 years ago

It looks like PT is saving wordlist twice.

Ding-adong commented 5 years ago

https://github.com/Ding-adong/aoo-mozilla-en-dict/blob/master/Dicaff.rar

Both dic and aff fine tuned and finally done.

Ding-adong commented 5 years ago

I have sorted some minor errors of capitalised words and updated the file above. You can use this now. Ph.D. doesn't work. I do not think a fullstop is allowed. Please check if it works on your end.

marcoagpinto commented 5 years ago

@Ding-adong Some British people were complaining of your "common US usage words" which I mass added in the past two or so months.

So, from now on, I will be more careful with what I add/change.

And I didn't sleep last night, I spent the night working on PTG and the "merge/delete .dic duplicates" is almost ready.

However, it is almost weekend, so I will only probably have the chance to work more on it next week.

Ding-adong commented 5 years ago

"common US usage words" - examples??? There are words that commonly used in the UK even though it originate in USA. The whole point of the dictionary is to allow spellchecker to check words quickly without constant stopping and asking for words like 'asshole'. If you remove them, then please let me know and I will simply put it in my personal dictionary. It isn't a problem but I was just looking out for others.

Perhaps you should do a personal dictionary and put the common USA words like asshole|badass etc in there. Users can either ignore it or copy and paste it into their personal dictionary.

Anyway my latest dic and aff files simply allows our words to be flagged. Nothing to do with adding USA words haha.

marcoagpinto commented 5 years ago

Anyway, I have released build 150 today and the font selection you suggested was implemented on build 147.