stem_model m_a - Githubissues

funderburkjim commented 6 years ago

masculine nouns ending in 'a'

We derive this list from the lexnorm-all2 list by the simple filter a) key1 ends in short vowel 'a' b) lexnorm is precisely 'm'.

This excludes many adjectives and other nominals ending in 'a', since these will have more complex normalized lexnorm values, such as 'm:f:n', 'm:f#ikA':n'.

There are 49344 of these simple masculine nouns in 'a'. Their information is put into file: inputs/nominals/m_a.txt. For example, the two inputs from lexnorm-all2 are merged into one input in m_a.txt:

579 akzara  a-kzara m
592.1   akzara  a-kzara m

becomes:
m_a a-kzara 579,akzara:592.1,akzara

funderburkjim commented 6 years ago

decline_file program

The decline_file program generates declensions based upon the model and stem (first two fields) of records in m_a.txt (or one of the other files of inputs/nominals/ directory). The output is written to a file in outputs/nominals/ directory under the same file name; e.g., in this case to outputs/nominals/m_a.txt.

The format of the output files generated by decline_file is a sequence of lines, each with 3 tab-delimited fields:

model (copied from input file)
stem (copied from input file - same as key2 in mw.txt)
inflection The declension table for this model and stem
format of declension table

The declension table is represented as a string with 24 parts (separated by colon), representing the singular, dual, plural of 8 cases. Symbolically 1s:1d:1p:2s:2d:2p:3s:3d:3p:4s:4d:4p:5s:5d:5p:6s:6d:6p:7s:7d:7p:8s:8d:8p. The common English names for the 8 cases are 1 = Nominative, 2 = Accusative, 3 = Instrumental, 4 = Dative, 5 = Ablative, 6 = Genitive, 7 = Locative, 8 = Vocative.
- missing values are represented by empty strings (such as vocative for personal pronouns)
- Sometimes, one or more of the 24 declension cells will have alternate values; these are represented in csv form with a forward-slash ('/') as the separator.

funderburkjim commented 6 years ago

example of declension table

For the line m_a kUpa 53937,kUpa, the output line is m_a kUpa kUpaH:kUpO:kUpAH:kUpam:kUpO:kUpAn:kUpena:kUpAByAm:kUpEH:kUpAya:kUpAByAm:kUpeByaH:kUpAt:kUpAByAm:kUpeByaH:kUpasya:kUpayoH:kUpAnAm:kUpe:kUpayoH:kUpezu:kUpa:kUpO:kUpAH

It is easier to compare the declension table when it is formatted as a table:

Case	S	D	P
Nominative	kUpaH	kUpO	kUpAH
Accusative	kUpam	kUpO	kUpAn
Instrumental	kUpena	kUpAByAm	kUpEH
Dative	kUpAya	kUpAByAm	kUpeByaH
Ablative	kUpAt	kUpAByAm	kUpeByaH
Genitive	kUpasya	kUpayoH	kUpAnAm
Locative	kUpe	kUpayoH	kUpezu
Vocative	kUpa	kUpO	kUpAH

This agrees with Deshpande, p. 35.

funderburkjim commented 6 years ago

Declension of rAma

The declension of rAma with model m_a is:

Case	S	D	P
Nominative	rAmaH	rAmO	rAmAH
Accusative	rAmam	rAmO	rAmAn
Instrumental	rAmeRa	rAmAByAm	rAmEH
Dative	rAmAya	rAmAByAm	rAmeByaH
Ablative	rAmAt	rAmAByAm	rAmeByaH
Genitive	rAmasya	rAmayoH	rAmARAm
Locative	rAme	rAmayoH	rAmezu
Vocative	rAma	rAmO	rAmAH

This agrees with Kale, Section 61, p. 35

funderburkjim commented 6 years ago

decline_checks.txt

This file shows declension tables checked against various sources, such as the two shown above. As it progresses, this can be used as a reference when algorithmic differences are introduced.

If others find the need, I could develop a web application to show these declensions with choice of the user's model and key2. I'll probably do this eventually, once the algorithms are stable. One feature would be to allow the user's choice of how to represent Sanskrit. Since the internals of the algorithms use the SLP1 spelling of Sanskrit words, it is easiest to show outputs, such as the tables above, also in SLP1.

gasyoun commented 6 years ago

easiest to show outputs, such as the tables above, also in SLP1.

If our readers are bots - it will work best.

drdhaval2785 commented 6 years ago

https://github.com/sanskrit-coders/indic_transliteration python package seems to support transliteration to and from SLP1 very well. So let us use it, if possible. So that custom transliteration code can be kept to bare minimum.

gasyoun commented 6 years ago

So that custom transliteration code can be kept to bare minimum.

Exactly. Not even @SergeA can read it well, what to speak of other humans...

funderburkjim commented 6 years ago

Are you suggesting that all the inputs/outputs that I'm creating should be duplicated, so that there are not only slp1 spellings but also IAST spellings?

funderburkjim commented 6 years ago

re the indic_transliteration package.

Does this package support accents?

drdhaval2785 commented 6 years ago

Are you suggesting that all the inputs/outputs that I'm creating should be duplicated, so that there are not only slp1 spellings but also IAST spellings?

No. Internals can remain SLP1. Just suggesting the repository so that you can generate output to be displayed to examiner in different encodings of his choice or can take input in different encodings.

drdhaval2785 commented 6 years ago

Does this package support accents?

I guess no. Any specific requirements for accents @funderburkjim ?

gasyoun commented 6 years ago

Internals can remain SLP1

Right.

sanskrit-lexicon / MWinflect

stem_model m_a #4

masculine nouns ending in 'a'

decline_file program

format of declension table

example of declension table

Declension of rAma

decline_checks.txt