gwitwer / foma

Automatically exported from code.google.com/p/foma
0 stars 0 forks source link

Special cases do not need e to é conversion. How to get that done? #7

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I have a question. Cases For and Tem are special cases, where word ending e 
should not turn to é.

 # rege+For            regéként    <-- should be regeként
 # rege+Tem            regékor     <-- should be regekor
 # rege+Posss3+For     regéjéként  <-- should be regéjeként

I tried 
 # define Etoee e -> é || _ "^" [ \0 & \{+Tem} & \{+For} ] ; # \0: not zero
and also
 # define Etoee e -> é || _ "^" [ \0 & \{ként} & \{kor} ] ; # \0: not zero

and similarly
 # define HarmRuleC C -> á // BackVowel \Vowel*  _ %^  [ \0 ] .o.
                 C -> é // FrontVowel \Vowel* _ %^  [ \0 ] .o.
                 C -> a // BackVowel \Vowel*  _ %^ [ 0 | {+For} ] .o. 
                 C -> e // FrontVowel \Vowel* _ %^ [ 0 | {+For} ] ;
and also
 # define HarmRuleC C -> á // BackVowel \Vowel*  _ %^  [ \0 ] .o.
                 C -> é // FrontVowel \Vowel* _ %^  [ \0 ] .o.
                 C -> a // BackVowel \Vowel*  _ %^ [ 0 | {ként} ] .o. 
                 C -> e // FrontVowel \Vowel* _ %^ [ 0 | {ként} ] ;

But this does not work. I can not find any solution, please help.

Attached the lexc/foma pair that is a little test program especially for this.

Thanks in advance.

Original issue reported on code.google.com by eleonor...@gmx.net on 3 Jan 2012 at 5:57

Attachments:

GoogleCodeExporter commented 8 years ago
I found a solution:
define Etoee e -> é || _ "^" [ \0 & \k ] ; # \0: not zero kor & ként excluded

and 
define HarmRuleC C -> á // BackVowel \Vowel*  _ %^  [ \0 & \k ] .o. #  ként 
excluded
                 C -> é // FrontVowel \Vowel* _ %^  [ \0 & \k  ] .o.
                 C -> a // BackVowel \Vowel*  _ %^ [ 0 ] .o. 
                 C -> e // FrontVowel \Vowel* _ %^ [ 0 ] ;

That works, because both special cases start with 'k'.

Is there no way to say:
I want to exclude case '+For' and case '+Tem' from a rule?

Original comment by eleonor...@gmx.net on 4 Jan 2012 at 11:29

GoogleCodeExporter commented 8 years ago
A brief comment: usually, if a rule is phonologically conditioned, it's a good 
idea to capture it with a rewrite rule, like you've done. 

On the other hand, if you're dealing with an exception, sometimes it's easier 
to mark it so in the lexicon, and have the rules bypass the exception. For 
example, in this instance, you could have marked those words where e does not 
alternate with é as, say E in the lexicon. That is, something like regE 
instead of rege. Then the rule won't affect that word, and you can place a rule 
like `E -> e` after the other rules.  Note that generally, it's most convenient 
to place the `E` only on the lower side (because you want the original form on 
the lexical side), so the entry should read something like:

{{{
rege:regE 
}}}

Minor detail, in `[ \0 & \k ]` the `\0`-part is redundant.

Original comment by mans.hul...@gmail.com on 4 Jan 2012 at 2:18

GoogleCodeExporter commented 8 years ago
The word rege is NOT an exception, but completely regular. The two endings 
(+Tem and +For) are the exceptions.

 #regét- Acc
 #regéhez- All
 # and so on...
 BUT
 #regekor- Tem
 #regeként - For
 #rege  - Nom

I can not fond out, how to say in proper regular expression form:
No ending, or ending "ként" or ending "kor" does not need e->é, all
others do need.

This is ok:
define Etoee e -> é || _ "^"  [ \k ]  ; # \0: not zero kor & ként excluded

but all trials to expand \k to \(ként) and \(kor) like:
define Etoee e -> é || _ "^"  [ \k \é \n \t | \k \o \r]  ; 
define Etoee e -> é || _ "^"  \[ k é n t | k o r ]  ; 
fail, since

 #apply down> rege+Noun+Acc
 #reget

gets wrong

How can I say: If no ending or ending = ként or ending = kor, no e->é rule, 
otherwise e->é rule?

I am worried, that \k is a bit too un-exact.

Original comment by eleonor...@gmx.net on 5 Jan 2012 at 9:28

GoogleCodeExporter commented 8 years ago
I also tried to add +Abl, +Acc ... +Tem to each word, and then trigger to +Abl, 
etc.., no success.

Lexc:
LEXICON Case
+Abl:^tUl+Abl      #;
+Acc:^Gt+Acc       #;
...

.foma:

define Grammar Lexicon            .o.
               Etoee             ; #.o.   Here I stop

Etoee looks:
define Etoee e -> é || .#. \"^"+ _ "^"  ?*  [ "+" A b l | "+" A c c | "+" 
{Ade} | "+" {All} | "+" {Cau} | "+" {Dat} | "+" {Del} | "+" {Ela} | "+" {Fac} | 
"+" {For} | "+" {Ill} | "+" {Ine} | "+" {Ins} | "+" {Nom} | "+" {Sub} | "+" 
{Sup}| "+" {Ter} ] ?* ; 

I try both A b l and {Ade} form, none works

Results:
foma[1]: down
apply down> rege+Noun+Abl
rege^tUl+Abl
apply down> rege+Noun+Acc
rege^Gt+Acc
apply down> rege+Noun+Ade
rege^nDl+Ade
apply down> 

 no e->é on any place :-(
foma[1]: lower-words
rege^ig+Ter
rege^Pn+Sup
rege^rF+Sub
rege^+Nom
rege^VFl+Ins
rege^bFn+Ine
rege^bF+Ill
rege^ként+For
rege^VD+Fac
rege^bUl+Ela
rege^rUl+Del
rege^nFk+Dat
rege^ért+Cau
rege^hIz+All
rege^nDl+Ade
rege^Gt+Acc
rege^tUl+Abl

Strange is, that I did the same modification on the English lexc/foma files 
before:
in lexc:
LEXICON Vinf
+V+PresPart:^ing+PP #;

in foma:
define ConsonantDoubling g -> g g ||  .#. \"^"+ _ "^" ?* [ "+" {PP} | e d ] ?*;
...
define CleanupPP [ "+" {PP} ] -> 0;

define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
...
               CleanupPP         .o.
               Cleanup;

regex Grammar;

That works perfectly well:
lower-words
beg
begs
begging
begged
begged

I attach both the English and the Hungarian files here.

Original comment by eleonor...@gmx.net on 5 Jan 2012 at 7:56

Attachments:

GoogleCodeExporter commented 8 years ago
I have found a quite well-looking solution. I modified step by step the English 
file, until it handled the Hungarian nouns, as it should. We can close this 
issue.

Original comment by eleonor...@gmx.net on 6 Jan 2012 at 8:41

Attachments: