sanskrit-lexicon / MWinflect

Generate declensions and conjugations based upon words in MW1899 dictionary.
1 stars 0 forks source link

stem_model intro: indeclineables #3

Open funderburkjim opened 6 years ago

funderburkjim commented 6 years ago

A particular program (stem_model.py) is used to interpret the inflection information for nouns, adjectives and indeclineables that is derived from certain meta-information present in the revised MW digitization.

This inflection information is present in the lexnorm-all2.txt file, whose format whose described in #2.

Before describing the interpretation that stem_model does, it may be useful to look at a few examples of the lexnorm input.

funderburkjim commented 6 years ago

lexnorm-all2 samples

L key1 key2 lexnorm
2 akAra a-kAra m
5 a a LEXID-pron,STEM-idam
7 a a m
8 afRin a-fRin m:f:n
10 aMSa aMSa m
20 aMSakaraRa aMSa-karaRa n
21 aMSakalpanA aMSa-kalpanA f
39 aMSaka aMSaka m:f#ikA:n

The last field 'lexnorm' contains a normalization of the information marked within a <lex> tag of the digitization. Here are extracts from the digitization corresponding to some entries of the table.

L body <info lex=>
2 <s>a—kAra</s> ¦ <lex>m.</lex> the letter or sound <s>a</s>. <info lex="m"/>
5 <s>a</s> <hom>4</hom> ¦ the base of some pronouns and <ab>pronom.</ab> forms, in <s>asya</s>, <s>atra</s>, &c. <info lexcat="LEXID=pron,STEM=idam"/>
8 <s>a-fRin</s>1 ¦ 1<lex>mfn.</lex>1 free from debt, 1<ls>L.</ls> <info lex="m:f:n"/>
39 <hom>1.</hom> <s>aMSaka</s> ¦ <lex>mf(<s>ikA</s>)n.</lex> (<ab>ifc.</ab>) forming part. <info lex="m:f#ikA:n"/>

Note:

funderburkjim commented 6 years ago

stem_model overview

The stem_model program aims to assign appropriate (stem,model) pairs for each record in lexnorm-all2.

It does this in several submodules, each of which deals with a restricted subset of the records. Each submodule scans all the records of lexnorm-all2, and for each record

This process will be become clearer as we proceed through the submodules. But let's start with the very simplest submodule.

funderburkjim commented 6 years ago

'pure' indeclineables

One of the simplest lexnorm fields is the one which has only the one component 'ind'. For example,

70283 ca ca ind

The 'model_ind' module of stem_model identifies just such cases. It outputs all such cases to a a file named 'ind.txt'. The line corresponding to 'ca' in ind.txt is

ind ca 70283,ca

This stem model file has 3 fields in each line:

ind.txt

This ind.txt file has the stems with model 'ind'.

Additional entries in ind.txt

When we later apply different submodules, there will be additional entries in ind.txt. For example, 98580 dvibarhAs dvi-barhAs n:ind.

MW text: dvi—barhās n. and ind., doubly close or thick or strong

dvi-barhAs can be an indeclineable, according to MW. But it can also be declined as a neuter noun. Since it has these two forms, the 'model_ind' submodule of stem_model skips it. Another submodule, not yet written, will have to handle it. When it does, then we will have another entry in ind.txt: ind dvi-barhAs 98580,dvibarhAs

funderburkjim commented 6 years ago

Nothing to decline

Since we thus far have just filtered out (some) of the indeclineables, and since there is no declension of indeclineables, there's not more to do here, at least for now.

funderburkjim commented 6 years ago

Why the stem-model approach?

This is the approach which seems most reasonable based upon my study of recent grammars.

Based upon my limited understanding of the Panini approach (primarily gleaned from Scharf's gshell program) has different emphases.

The next stem_model submodule (model_m_a) and the associated declension module, will start to flesh out the stem_model approach.

funderburkjim commented 6 years ago

outputs/nominals/ind.txt

The indeclineables are put into a format similar to normal declineables by program:

# in inflect directory
python decline_file.py ../inputs/nominals/ind.txt ../outputs/nominals/ind.txt 

This is for anticipated convenience when later creating databases. A typical output is

model key2     key1
ind      a-kAle  akAle


The decline_file program and the general output format will be described in the next issue for
m_a model.