sanskrit-lexicon / MWinflect

Generate declensions and conjugations based upon words in MW1899 dictionary.
1 stars 0 forks source link

m_a declension algorithm #5

Open funderburkjim opened 6 years ago

funderburkjim commented 6 years ago

"A beginning is a very delicate time. " ... Dune, by Frank Herbert

Since the declension algorithm for masculine nouns ending in 'a' is the first , it makes sense to spend a lot of time discussing the details.

Kale begins his discussion of declension with:

Declension consists in adding the case terminations to the crude form or base.

There are three key parts here:

funderburkjim commented 6 years ago

endings for m_a model

The endings used by me for the m_a model are:

Case S D P
Nominative aH O AH
Accusative am O An
Instrumental ena AByAm EH
Dative Aya AByAm eByaH
Ablative At AByAm eByaH
Genitive asya ayoH AnAm
Locative e ayoH ezu
Vocative a O AH

the base for m_a model

Our declensions start with the headword spellings provided by Monier Williams. For masculine nouns ending in 'a', these spellings all end in 'a' : such as 'kUpa', 'rAma'.

The declension algorithm constructs the base by removing the final 'a'. So, the base for 'kUpa' is 'kUp', the base for 'rAma' is 'rAm', etc.

funderburkjim commented 6 years ago

joining base and endings for kUpa

For kUpa, the joining is the simplest possible: string concatenation of base and ending.

For the Nominative singular, kUp joined to aH gives kUpaH.

Using '+' to represent string concatenation, we can explain the declension table of kUpa:

Case S D P
Nominative kUp + aH = kUpaH kUp + O = kUpO kUp + AH = kUpAH
Accusative kUp + am = kUpam kUp + O = kUpO kUp + An = kUpAn
Instrumental kUp + ena = kUpena kUp + AByAm = kUpAByAm kUp + EH = kUpEH
Dative kUp + Aya = kUpAya kUp + AByAm = kUpAByAm kUp + eByaH = kUpeByaH
Ablative kUp + At = kUpAt kUp + AByAm = kUpAByAm kUp + eByaH = kUpeByaH
Genitive kUp + asya = kUpasya kUp + ayoH = kUpayoH kUp + AnAm = kUpAnAm
Locative kUp + e = kUpe kUp + ayoH = kUpayoH kUp + ezu = kUpezu
Vocative kUp + a = kUpa kUp + O = kUpO kUp + AH = kUpAH
funderburkjim commented 6 years ago

Partially wrong Declension of rAma

If we concatenate the base rAm to the m_a endings, we get this table, which is wrong in the 3s and 8p (Instrumental singular and Genitive plural).

Case S D P
Nominative rAm + aH = rAmaH rAm + O = rAmO rAm + AH = rAmAH
Accusative rAm + am = rAmam rAm + O = rAmO rAm + An = rAmAn
Instrumental rAm + ena = rAmena rAm + AByAm = rAmAByAm rAm + EH = rAmEH
Dative rAm + Aya = rAmAya rAm + AByAm = rAmAByAm rAm + eByaH = rAmeByaH
Ablative rAm + At = rAmAt rAm + AByAm = rAmAByAm rAm + eByaH = rAmeByaH
Genitive rAm + asya = rAmasya rAm + ayoH = rAmayoH rAm + AnAm = rAmAnAm
Locative rAm + e = rAme rAm + ayoH = rAmayoH rAm + ezu = rAmezu
Vocative rAm + a = rAma rAm + O = rAmO rAm + AH = rAmAH
funderburkjim commented 6 years ago

Correct declension

Joining the base to the ending by simple string concatenation gives the incorrect results in 3s and 7p for rAma declension. We must apply a sandhi rule to change the dental nasal 'n' of these two endings to the cerebral nasal 'R'. So joining is a two-step process:

The details of the nR sandhi are such that for the other endings besides 3s and 7s, the rule makes no change in the result.

Thus, in all cells of the declension, we arrive at the correct result by applying two steps in the joining process:

Using an arrow '->' to represent this two-step joining process, we can describe the correct declension algorithm as:

Case S D P
Nominative rAm + aH -> rAmaH rAm + O -> rAmO rAm + AH -> rAmAH
Accusative rAm + am -> rAmam rAm + O -> rAmO rAm + An -> rAmAn
Instrumental rAm + ena -> rAmeRa rAm + AByAm -> rAmAByAm rAm + EH -> rAmEH
Dative rAm + Aya -> rAmAya rAm + AByAm -> rAmAByAm rAm + eByaH -> rAmeByaH
Ablative rAm + At -> rAmAt rAm + AByAm -> rAmAByAm rAm + eByaH -> rAmeByaH
Genitive rAm + asya -> rAmasya rAm + ayoH -> rAmayoH rAm + AnAm -> rAmARAm
Locative rAm + e -> rAme rAm + ayoH -> rAmayoH rAm + ezu -> rAmezu
Vocative rAm + a -> rAma rAm + O -> rAmO rAm + AH -> rAmAH
funderburkjim commented 6 years ago

Antoine's statement of nR sandhi

When, in the same word, n is preceded by f, F, r, or z and followed by a vowel or one of n, m, y or v, it is changed to R provided the intervening letters be not palatals (c C j J Y), cerebrals (w W q Q R), dentals (t T d D n), or one of the three letters s, l, or S.

This is one of the most complicated sandhi rules that come to mind.

Applying this rule to the concatenated form rAmena, we have

Similarly, rAmAnAm becomes rAmARAm.

Among the other 22 other declined forms,

gasyoun commented 6 years ago

Jim, crystal clear as usual. I like https://sanskritstudio.wordpress.com/2014/01/22/sanskrit-internal-sandhi-retroflexion-of-n-to-n/ approach and Macdonnell's scan:

uvr10wcepf0

But why reinventing the wheel? @drdhaval2785, Oliver Hellwig and Huet should have all the coded needed ready.

funderburkjim commented 6 years ago

macdonnel's formulation

A difference from Antoine's description --

My algorithm does allow intervening 'h','v' in agreement with Macdonnell.

My current algorithm also allows visarga and anusvara (H, M) among intervening letters, which neither of the Antoine or Macdonell algorithm.

I forgot to mention the h,v,H,M variance from Antoine in my version of nR algorithm.

See next comments for discussion of H,M.

funderburkjim commented 6 years ago

M as intervening letter.

I made a test version of the nR algorithm which excludes M,H as an allowed intervening letter.

Using the 47000+ m_a words as a testbed, this test version gave different declensions for 27 words.

26 of these 27 have an intervening 'M'; interestingly, they all involve taraMga. When 'M' is allowed as intervening letter, the 3s form is taraMgeRa. When 'M' is not allowed as intervening letter, the 3s form is taraMgena.

Which is right?

Using Huet's Sanskrit grammarian program , the declension of taraMga in masculine has the 3s as taraṅgeṇa. His program has replaced the anusvara before 'g' with the homorganic guttural nasal 'N' (ṅ); since gutturals are allowed intervening letters, the 'n' in 'ena' is changed to cerebral 'ṇ'.

Conclusion: The algorithm should allow intervening 'M'.

funderburkjim commented 6 years ago

H as intervening letter.

In the test, only 1 word with an intervening visarga was found: duruHPa

Huet's 3s is duruḥphena so in his algorithm the intervening visarga NOT allowed.

I cannot quote a reason for my inclusion of H; based on the original code, probably somewhere I read that 'H' should be included among gutturals and/or labials.

Because of Huet's example and because I currently have no justification for H, I'll remove 'H' from among the allowed intervening letters in nR sandhi. Thus 'duruHPena' will be 3s form.

SergeA commented 6 years ago

Huet's 3s is duruḥphena so in his algorithm the intervening visarga NOT allowed. I cannot quote a reason for my inclusion of H; based on the original code, probably somewhere I read that 'H' should be included among gutturals and/or labials.

I think you are right in this reasoning. While pronouncing visarga the tip of the tongue does not change its position, so the n>ṇ sandhi should be applicable. In theory. But it is a rare case and I don't know if visarga is mentioned by any authority.

I've tried to learn the rule from translations of Paninian sutras, but I found these sutras are too vague and incomplete. To my great surprise Panini failed even to give the main rule (as it is in the provided picture above). He didn't mention vowels ṛ ṝ as triggers. He didn't mention the following after n letters. In the allowed intervening letters he mentioned (8.4.2): aṭ - a i u ṛ ḷ e o ai au h y v r ku - k kh g gh ṅ pu - p ph b bh m āṅ - prefix ā as in pary-ā-ṇaddham (I didn't catch this) nuṁ - the nasal augment as one changed into anusvara in bṛṁhaṇa (bṛh>bṛnh>bṛh) - here commentators explain that for the correct application we should read this 'nuṁ' as if it is written 'anusvara'. The rule functions samāna-pade - in the same word. But here arises the question, what is meant by the 'word' and by the 'same'. Panini give many sutras for compound words with vana, pāna etc. and for different verbal roots.

gasyoun commented 6 years ago

In the test, only 1 word with an intervening visarga was found: duruHPa

That's the best part about Sanskrit NLP - one can actually test. Otherwise, we can write vague rules, but I never was aware that it was a one-word question. So it's a theoretical question more than practical. But an important one.

Bucknell:

78-n-cerebral

funderburkjim commented 6 years ago

These summaries from other sources are material additions to the development of the inflection algorithms. Keep 'em coming! 👍

funderburkjim commented 6 years ago

theoretical question more than practical

I think I disagree. The reason is that in this project the aim is to find inflections of MW headwords.

But sometimes, for a given MW headword, the correct inflection requires that the inflection apply only to the last pada of the word. See discussion of #6.

For example, to solve the practical problem of 'what is the Instrumental singular of akzaramuKa?' we need to take into account that this is a compound akzara-muKa and that the 'r' in first pada akzara does NOT play a role in joining the ending 'ena' to muKa, so the 3s of akzaramuKa is akzaramuKena . By contrast, if we considered the pada to be akzaramuKa, nR sandhi would make the 3s to be akzaramuKeRa. So inflection results depend on preliminary evaluation of what is the ' samāna-pade - in the same word'. Our strategy is to take MW's implied compound structure as the meas of determining the 'pada' to inflect; for this compound structure the algorithm currently uses the hyphenation of 'key2' .

drdhaval2785 commented 6 years ago

text = re.sub('([rfFz][aAiIuUfFxXeEoOhyvrkKgGNpPbBmM]*)[n]', '\g<1>R', text) Translation of Paninian rules in regex.