jimregan / foma

Automatically exported from code.google.com/p/foma
0 stars 0 forks source link

Question about upcase solution #34

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I solved the problem to also recognize upcase words using an upcase converter 
like:

define ToUpcase a -> A || .#. _ ,,
                á -> Á || .#. _ ,,
                b -> B || .#. _ ,,
                c -> C || .#. _ ,,
                d -> D || .#. _ ,,
                e -> E || .#. _ ,,
                é -> É || .#. _ ,,
                f -> F || .#. _ ,,
                g -> G || .#. _ ,,
                h -> H || .#. _ ,,
                i -> I || .#. _ ,,
                í -> Í || .#. _ ,,
                j -> J || .#. _ ,,
                k -> K || .#. _ ,,
                l -> L || .#. _ ,,
                m -> M || .#. _ ,,
                n -> N || .#. _ ,,
                o -> O || .#. _ ,,
                ó -> Ó || .#. _ ,,
                ö -> Ö || .#. _ ,,
                ő -> Ő || .#. _ ,,
                p -> P || .#. _ ,,
                q -> Q || .#. _ ,,
                r -> R || .#. _ ,,
                s -> S || .#. _ ,,
                t -> T || .#. _ ,,
                u -> U || .#. _ ,,
                ú -> Ú || .#. _ ,,
                ü -> Ü || .#. _ ,,
                ű -> Ű || .#. _ ,,
                v -> V || .#. _ ,,
                w -> W || .#. _ ,,
                x -> X || .#. _ ,,
                y -> Y || .#. _ ,,
                z -> Z || .#. _ ;

and by doubling all grammars having a normal and an upcase version:
define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

define Grammarup Lexicon           .o. 
               ToUpcase          .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

regex Grammar | Grammarup;

Attached the complete project.

The approach has two disadvantages:
1. I have to double all grammars
2. using down:
foma[1]: down
apply down> cat+N+Sg
cat
Cat
apply down> Peter+N+Sg
Peter

I also get for cat+N+Sg Cat, which is obvious and in fact unnecessary.

Is there a more elegant way to solve up/lower case, or is my one the optimal 
one?

Thanks in advance.

Original issue reported on code.google.com by eleonor...@gmx.net on 29 Jul 2012 at 12:28

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Here's a simpler, although ultimately equivalent way of doing it. 

First we define a transducer that optionally uppercases the first letter:

define UpCase 
[a:A|b:B|c:C|d:D|e:E|f:F|g:G|h:H|i:I|j:J|l:L|m:M|n:N|o:O|p:P|q:Q|r:R|s:S|t:T|u:U
|v:V|w:W|x:X|y:Y|z:Z] ?* | ?*;

And then we compose this in last after Cleanup:

define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup           .o.
               UpCase ;

As for the second question: there is no way to build a transducer that would 
map cat+N+Sg only to cat and where its inverse would also map Cat to cat+N+Sg.  
They are the same device, so to speak, and contain the same mappings regardless 
of the direction. The only way to get cat as the only output for cat+N+Sg is to 
define two separate transducers, one for generation (without the uppercasing 
composed in), and another one for parsing (with uppercasing). It is in fact 
fairly normal to maintain two such transducers for various reasons.

Original comment by mans.hul...@gmail.com on 29 Jul 2012 at 3:38

GoogleCodeExporter commented 8 years ago
Thanks for the answer and help.
The UpCase transducer is a big deal for me, because I have 8200 lines in the 
foma code now, and it does matter for 
* code maintenance
* adding new code
* compiling 
if I have 8200 lines or 16400 lines.

For the second also thanks for the idea with separating generation and parsing. 
At present I am not that far yet, that I can decide, if I do that.

Original comment by eleonor...@gmx.net on 30 Jul 2012 at 8:21