marcoagpinto / aoo-mozilla-en-dict

English Dictionaries Project (AOO+Mozilla+others)
159 stars 24 forks source link

Errors in FLAGS "N" and others. #37

Closed Ding-adong closed 5 years ago

Ding-adong commented 5 years ago

Doing more work today and stumbled across some errors regarding flag N #30. Sorry, bad news, it's still broken. I can not add ion to word like advent - advention.

I added a new line SFX N 0 ion t = advention but SFX N 0 ation [^i]t = adventation was created and I don't want it. It need more adjustments. Firstly is there a point in having N and n when they do almost the same thing and creates duplication of words.

For the 3rd time, have you read https://www.systutorials.com/docs/linux/man/4-hunspell/#index ? or https://github.com/hunspell/hunspell/releases/download/v1.7.0/hunspell5.pdf ?

In it it does explain other options, eg. SFX P Y 6 SFX P y iness [^aeiou]y SFX P 0 ness [aeiou]y SFX P 0 ness [^y] SFX P y iness's [^aeiou]y SFX P 0 ness's [aeiou]y SFX P 0 ness's [^y]

The above is repeated a few time. I did this. SFX P Y 3 SFX P y iness/MS [^aeiou]y SFX P 0 ness/MS [aeiou]y SFX P 0 ness/MS [^y] giving ness/ness's/nesses in 3 lines.

Having looked into this for over a month, hunspell.dll needs some improvements. Which hunspell file do you use or open office use?

Ding-adong commented 5 years ago

Another one.

SFX n Y 28 SFX n 0 tion a SFX n e tion ce SFX n ke cation ke SFX n e ation [iou]te SFX n e ion [^iou]te SFX n e ation [^ckt]e SFX n el ulsion el SFX n 0 lation [aiou]l SFX n 0 ation [^aeiou]l SFX n er ration er SFX n 0 ation [^e]r SFX n y ation py SFX n y ication [^p]y SFX n 0 ation [^aelry] SFX n 0 tions a SFX n e tions ce SFX n ke cations ke SFX n e ations [iou]te SFX n e ions [^iou]te SFX n e ations [^ckt]e SFX n el ulsions el SFX n 0 lations [aiou]l SFX n 0 ations [^aeiou]l SFX n er rations er SFX n 0 ations [^e]r SFX n y ations py SFX n y ications [^p]y SFX n 0 ations [^aelry]

change to

SFX n Y 14 SFX n 0 tion/S a SFX n e tion/S ce SFX n ke cation/S ke SFX n e ation/S [iou]te SFX n e ion/S [^iou]te SFX n e ation/S [^ckt]e SFX n el ulsion/S el SFX n 0 lation/S [aiou]l SFX n 0 ation/S [^aeiou]l SFX n er ration/S er SFX n 0 ation/S [^e]r SFX n y ation/S py SFX n y ication/S [^p]y SFX n 0 ation/S [^aelry]

saving 14 lines.

marcoagpinto commented 5 years ago

@Ding-adong

I am "gone" for the weekend.

Flag "n" allows to add singular and plural while "N" only adds singular.

This is the way I have been using it.

In the weekend I can't dedicate time to the .aff ... the maximum I can do is add a few words or so.

On Monday I will be back at it.

Ding-adong commented 5 years ago

Hope you had a good weekend. Sorted N. SFX N 0 ation [^i]t to SFX N 0 ation [^v].[^i]t

marcoagpinto commented 5 years ago

Doing more work today and stumbled across some errors regarding flag N #30. Sorry, bad news, it's still broken. I can not add ion to word like advent - advention.

I added a new line SFX N 0 ion t = advention but SFX N 0 ation [^i]t = adventation was created and I don't want it. It need more adjustments. Firstly is there a point in having N and n when they do almost the same thing and creates duplication of words.

For the 3rd time, have you read https://www.systutorials.com/docs/linux/man/4-hunspell/#index ? or https://github.com/hunspell/hunspell/releases/download/v1.7.0/hunspell5.pdf ?

In it it does explain other options, eg. SFX P Y 6 SFX P y iness [^aeiou]y SFX P 0 ness [aeiou]y SFX P 0 ness [^y] SFX P y iness's [^aeiou]y SFX P 0 ness's [aeiou]y SFX P 0 ness's [^y]

The above is repeated a few time. I did this. SFX P Y 3 SFX P y iness/MS [^aeiou]y SFX P 0 ness/MS [aeiou]y SFX P 0 ness/MS [^y] giving ness/ness's/nesses in 3 lines.

Having looked into this for over a month, hunspell.dll needs some improvements. Which hunspell file do you use or open office use?

Hello!

I am not a big fan of twofold (rules that call subrules).

EDIT: On a better thought, I will implement it today.

EDIT2: There is a bug in PTG using twofold which removed ~51 words from the .dic (using prefixes). I made a .diff and found out. I will try to fix it soon.

Ding-adong commented 5 years ago

Don't do the SFX P. There is a limit in hunspell, 1 prefix and two suffixes. I thought it meant /MS < two suffixes. I had an example of /YP then P /MS making four suffixes and it wouldn't work. Don't do any changes as I have done many fixes in the AFF file already and still inputting the ous. Taking longer than expected. Be about 3 more days.

marcoagpinto commented 5 years ago

Don't do the SFX P. There is a limit in hunspell, 1 prefix and two suffixes. I thought it meant /MS < two suffixes. I had an example of /YP then P /MS making four suffixes and it wouldn't work. Don't do any changes as I have done many fixes in the AFF file already and still inputting the ous. Taking longer than expected. Be about 3 more days.

@Ding-adong I have just fixed PTG. It now processes well the: affected/EPY

I noticed that it added "es" to all "'s" in cases like: aimlessness's aimlessnesses <- added word This increased the wordlist in around 1800 words which covers more cases.

Can you confirm if all "ness" have "nesses" as plural (as a native speaker)? This is an important addition to the wordlist.

What is that limit you talked about? Can you unmunch the GB speller in Ubuntu (using the official Hunspell package) with the twofold rule and check in a diff if the only difference are the "nesses"?

If you can't use Ubuntu, tomorrow I will talk to some people on IRC asking them how to do it as I have only done it in 2013 or so and can't remember how.

Thank you for all your help!

marcoagpinto commented 5 years ago

@Ding-adong This is what I mean by fixing it in PTG minutes ago: (before:) before

(after:) after

Ding-adong commented 5 years ago

All ness plural are es on the end. It actually saves space. So far I've added maybe hundreds of new missing words and saved even more by being efficient. The dic now 300 lines lower than the start and that is with more words. So far so good eh. Remember with plural, just because you haven't seen anyone use it, it doesn't mean it is wrong.

Remember don't change the dic otherwise comparison will be more difficult. I moved prefix letters to the end of the word ie /YPE. Firstly, it is very hard to read and track when the list of words are randomised and not in logical order, easy to read quickly.

A list so far. put he letters in those order and it looks and read better. Affix order for easy reading, almost alphabetical order, and short to longest words. WYP MSDG Dh Gk GJk Ww1 Qstq uk 89+- usa YR 7l 36j ZY R3 Vvu RN(or)n Oo

marcoagpinto commented 5 years ago

In it it does explain other options, eg. SFX P Y 6 SFX P y iness [^aeiou]y SFX P 0 ness [aeiou]y SFX P 0 ness [^y] SFX P y iness's [^aeiou]y SFX P 0 ness's [aeiou]y SFX P 0 ness's [^y]

The above is repeated a few time. I did this. SFX P Y 3 SFX P y iness/MS [^aeiou]y SFX P 0 ness/MS [aeiou]y SFX P 0 ness/MS [^y] giving ness/ness's/nesses in 3 lines.

Done and tested with Thunderbird!

marcoagpinto commented 5 years ago

Having looked into this for over a month, hunspell.dll needs some improvements. Which hunspell file do you use or open office use?

I don't use any Hunspell file. I coded the engine myself.

Ding-adong commented 5 years ago

Don't do the SFX P. There is a limit in hunspell, 1 prefix and two suffixes. It won't work properly and I have kept the original intact. Where is your engine then? Is it the same as hunspell or what?

marcoagpinto commented 5 years ago

Don't do the SFX P. There is a limit in hunspell, 1 prefix and two suffixes. It won't work properly and I have kept the original intact.

Well, it worked okay with Thunderbird, so, it should work also with others?

Where is your engine then? Is it the same as hunspell or what?

I coded the engine myself based on how Hunspell works (people explained to me how things worked and based on that I coded it myself).

That is how I do my coding: I base myself on theory and implement self created algorithms.

marcoagpinto commented 5 years ago

Don't do the SFX P. There is a limit in hunspell, 1 prefix and two suffixes. It won't work properly and I have kept the original intact.

Well, it worked okay with Thunderbird, so, it should work also with others?

Damn... better not to risk because some software may still run an old version of Hunspell, so I am going to revert the flag and just add the plural rule to it.

Ding-adong commented 5 years ago

You do realise that this dic and aff is being used by other software, not just browsers, but editing subtitles, etc. Others use Hunspell, backward compatible. Why didn't you use hunspell. It's well coded and small filesize? It looks like a lot of hassle creating an engine for no reason.

marcoagpinto commented 5 years ago

You do realise that this dic and aff is being used by other software, not just browsers, but editing subtitles, etc. Others use Hunspell, backward compatible. Why didn't you use hunspell. It's well coded and small filesize? It looks like a lot of hassle creating an engine for no reason.

This is how I work. I code things myself based on the explanation of how they work.

:-)

How could I have coded the PhD project (software) without doing it this way?

Maybe I am silly, but I always get good results.

Ding-adong commented 5 years ago

Fair enough. It's your time. Does your engine follow the same rules as hunspell or is there any differences? If so like what?

marcoagpinto commented 5 years ago

Fair enough. It's your time. Does your engine follow the same rules as hunspell or is there any differences? If so like what?

It is supposed to work exactly like Hunspell with the exception that it won't work 100% okay with some language specific rules like the Dutch.

The Dutch speller has Hunspell commands created just for it and without documentation and examples, I can't code them into PTG.

Another reason why I could never use the Hunspell DLL directly is because I don't know how to do it from inside PureBasic.

The RegExp library was built into PureBasic, so there were commands to use it... but there are no Hunspell commands in PureBasic... there was no other choice or way of achieving it... I had to do the hard way...

Ding-adong commented 5 years ago

Interesting. some links may help. https://www.purebasic.fr/english/viewtopic.php?t=43739 https://www.purebasic.fr/english/viewtopic.php?f=12&t=49236&start=0

marcoagpinto commented 5 years ago

Last night I improved the "P" flag to do plurals (-nesses):

SFX P Y 9
SFX P y iness [^aeiou]y 
SFX P 0 ness [aeiou]y 
SFX P 0 ness [^y] 
SFX P y inesses [^aeiou]y 
SFX P 0 nesses [aeiou]y 
SFX P 0 nesses [^y] 
SFX P y iness's [^aeiou]y 
SFX P 0 ness's [aeiou]y 
SFX P 0 ness's [^y]  
Ding-adong commented 5 years ago

I've told you I'd have already done it, 4 days ago.

Ding-adong commented 5 years ago

What is the fundamental differences between R and r? I understand that sometimes a word ending with a letter mostly have er, for example, but sometimes ier, thus needing two flags. Only number 8 was used, 1 to 7 are duplicates of flag R. Number 8 is used when you don't want to add another same consonant such as dim can be dimer and dimmer.

Might as well put word ending or in flag r.

Ding-adong commented 5 years ago

Can not use N n to add ation for words ending er such as ponder. It assumes all er are converted to re.

marcoagpinto commented 5 years ago

I have been improving myself the rules of the .aff as I clean the .dic .

It will go slowly but it is happening.