Open GoogleCodeExporter opened 8 years ago
I created a method, however it has a little error: a word gets also compounded
with itself, which is a nonsense. The idea is:
{{{
LEXICON Root
Noun1 ;
LEXICON Noun1
cat Noun2;
city Noun2;
fox Noun2;
panic Noun2;
try Noun2;
watch Noun2;
Noun2;
LEXICON Noun2
0:cat Ninf;
0:city Ninf;
0:fox Ninf;
0:panic Ninf;
0:try Ninf;
0:watch Ninf;
}}}
I get as result:
{{{
catsnék
catnak
catt
cat
catcatsnék
catcatnak
catcatt
catcat
catwatchnak
catwatcht
catwatch
catwatchesnék
catpanicsnék
catpanicnak
catpanict
catpanic
catfoxnak
catfoxt
catfox
catfoxesnék
}}}
where catcat is a nonsense.
Does anybody have any idea, how to avoid the same word twice?
In reality Noun1 and Noun2 should contain the same word set,
round 50.000 words, and I also think of a third and fourth one for
triple and quadro compunds.
Original comment by eleonor...@gmx.net
on 28 Sep 2012 at 1:58
Attachments:
I have found a solution for filtering identical elements. Maybe, this could go
into the documentation.
{{{
!eq4.lexc: here re the first parts of the compound words; the words do not get
any ending.
Multichar_Symbols +Noun
LEXICON Root
+Noun:0 Nouns ;
LEXICON Nouns
cat #;
dog #;
horse #;
!eq41.lexc: The second part of the compound words. The words get all
conjugation endings
Multichar_Symbols +Noun +Def +Indef +Nom +Acc +Gen +Plur
+Prep+ +Art+ uN aN iN
LEXICON Root
+Noun:0 Nouns ;
LEXICON Nouns
cat AddNoun;
dog AddNoun;
horse AddNoun;
rat AddNoun;
nyuszi AddNoun;
LEXICON AddNoun
+Acc:#%^t #;
+Plur:#%^s #;
#
# eq4.foma: reads in the lexc files
# adds delimiters, get identical words, build difference
# filter
#
read lexc eq4.lexc
define Lexicon
read lexc eq41.lexc
define Lexicon2
# add limits
define Lex1 %< Lexicon %# %< Lexicon2 ;
# get identical words using _eq
define Dlex [_eq( Lex1 , %< , %#)];
# filter out > and <
define CleanupTags %> -> 0 ,,
%< -> 0 ,,
%# -> 0;
# Grammar: difference filtered
define Grammar Lex1 - Dlex .o.
CleanupTags
;
regex Grammar;
Run result:
$ foma -l eq4.foma
...
foma[1]: lower-words
horsedog^s
horsedog^t
horsecat^s
horsecat^t
horserat^s
horserat^t
horsenyuszi^s
horsenyuszi^t
doghorse^s
doghorse^t
dogcat^s
dogcat^t
dograt^s
dograt^t
dognyuszi^s
dognyuszi^t
cathorse^s
cathorse^t
catdog^s
catdog^t
catrat^s
catrat^t
catnyuszi^s
catnyuszi^t
foma[1]:
}}}
Original comment by eleonor...@gmx.net
on 7 Oct 2012 at 12:13
Original issue reported on code.google.com by
eleonor...@gmx.net
on 8 Jan 2012 at 9:48