m_a declension algorithm

funderburkjim commented 6 years ago

"A beginning is a very delicate time. " ... Dune, by Frank Herbert

Since the declension algorithm for masculine nouns ending in 'a' is the first , it makes sense to spend a lot of time discussing the details.

Kale begins his discussion of declension with:

Declension consists in adding the case terminations to the crude form or base.

There are three key parts here:

case terminations (called 'sup' in Sanskrit grammar) - I'll use ending often.
crude form or base
adding - the process of joining the base and ending.

funderburkjim commented 6 years ago

endings for m_a model

The endings used by me for the m_a model are:

Case	S	D	P
Nominative	aH	O	AH
Accusative	am	O	An
Instrumental	ena	AByAm	EH
Dative	Aya	AByAm	eByaH
Ablative	At	AByAm	eByaH
Genitive	asya	ayoH	AnAm
Locative	e	ayoH	ezu
Vocative	a	O	AH

the base for m_a model

Our declensions start with the headword spellings provided by Monier Williams. For masculine nouns ending in 'a', these spellings all end in 'a' : such as 'kUpa', 'rAma'.

The declension algorithm constructs the base by removing the final 'a'. So, the base for 'kUpa' is 'kUp', the base for 'rAma' is 'rAm', etc.

funderburkjim commented 6 years ago

joining base and endings for kUpa

For kUpa, the joining is the simplest possible: string concatenation of base and ending.

For the Nominative singular, kUp joined to aH gives kUpaH.

Using '+' to represent string concatenation, we can explain the declension table of kUpa:

Case	S	D	P
Nominative	kUp + aH = kUpaH	kUp + O = kUpO	kUp + AH = kUpAH
Accusative	kUp + am = kUpam	kUp + O = kUpO	kUp + An = kUpAn
Instrumental	kUp + ena = kUpena	kUp + AByAm = kUpAByAm	kUp + EH = kUpEH
Dative	kUp + Aya = kUpAya	kUp + AByAm = kUpAByAm	kUp + eByaH = kUpeByaH
Ablative	kUp + At = kUpAt	kUp + AByAm = kUpAByAm	kUp + eByaH = kUpeByaH
Genitive	kUp + asya = kUpasya	kUp + ayoH = kUpayoH	kUp + AnAm = kUpAnAm
Locative	kUp + e = kUpe	kUp + ayoH = kUpayoH	kUp + ezu = kUpezu
Vocative	kUp + a = kUpa	kUp + O = kUpO	kUp + AH = kUpAH

funderburkjim commented 6 years ago

Partially wrong Declension of rAma

If we concatenate the base rAm to the m_a endings, we get this table, which is wrong in the 3s and 8p (Instrumental singular and Genitive plural).

Case	S	D	P
Nominative	rAm + aH = rAmaH	rAm + O = rAmO	rAm + AH = rAmAH
Accusative	rAm + am = rAmam	rAm + O = rAmO	rAm + An = rAmAn
Instrumental	rAm + ena = ~~rAmena~~	rAm + AByAm = rAmAByAm	rAm + EH = rAmEH
Dative	rAm + Aya = rAmAya	rAm + AByAm = rAmAByAm	rAm + eByaH = rAmeByaH
Ablative	rAm + At = rAmAt	rAm + AByAm = rAmAByAm	rAm + eByaH = rAmeByaH
Genitive	rAm + asya = rAmasya	rAm + ayoH = rAmayoH	rAm + AnAm = ~~rAmAnAm~~
Locative	rAm + e = rAme	rAm + ayoH = rAmayoH	rAm + ezu = rAmezu
Vocative	rAm + a = rAma	rAm + O = rAmO	rAm + AH = rAmAH

funderburkjim commented 6 years ago

Correct declension

Joining the base to the ending by simple string concatenation gives the incorrect results in 3s and 7p for rAma declension. We must apply a sandhi rule to change the dental nasal 'n' of these two endings to the cerebral nasal 'R'. So joining is a two-step process:

3s: rAm + ena -> rAmena , which becomes rAmeRa by nR sandhi
7s: rAm + AnAm -> rAmAnAm, which becomes rAmARAm by nR sandhi.

The details of the nR sandhi are such that for the other endings besides 3s and 7s, the rule makes no change in the result.

Thus, in all cells of the declension, we arrive at the correct result by applying two steps in the joining process:

concatenate base and ending, getting X
apply nR sandhi to X, getting Y, the end result

Using an arrow '->' to represent this two-step joining process, we can describe the correct declension algorithm as:

Case	S	D	P
Nominative	rAm + aH -> rAmaH	rAm + O -> rAmO	rAm + AH -> rAmAH
Accusative	rAm + am -> rAmam	rAm + O -> rAmO	rAm + An -> rAmAn
Instrumental	rAm + ena -> rAmeRa	rAm + AByAm -> rAmAByAm	rAm + EH -> rAmEH
Dative	rAm + Aya -> rAmAya	rAm + AByAm -> rAmAByAm	rAm + eByaH -> rAmeByaH
Ablative	rAm + At -> rAmAt	rAm + AByAm -> rAmAByAm	rAm + eByaH -> rAmeByaH
Genitive	rAm + asya -> rAmasya	rAm + ayoH -> rAmayoH	rAm + AnAm -> rAmARAm
Locative	rAm + e -> rAme	rAm + ayoH -> rAmayoH	rAm + ezu -> rAmezu
Vocative	rAm + a -> rAma	rAm + O -> rAmO	rAm + AH -> rAmAH

funderburkjim commented 6 years ago

Antoine's statement of nR sandhi

When, in the same word, n is preceded by f, F, r, or z and followed by a vowel or one of n, m, y or v, it is changed to R provided the intervening letters be not palatals (c C j J Y), cerebrals (w W q Q R), dentals (t T d D n), or one of the three letters s, l, or S.

This is one of the most complicated sandhi rules that come to mind.

Applying this rule to the concatenated form rAmena, we have

n preceded by r: rAmena
That n followed by a vowel : rAmena
the intervening letters being between r and n are Ame : rAmena
None of these intervening letters is a palatal, cerebral, dental or s,l,S.
Thus the dental 'n' becomes cerebral 'R': rAmeRa

Similarly, rAmAnAm becomes rAmARAm.

Among the other 22 other declined forms,

in case 2p (rAmAn) there is an n preceded by r, but that n is followed by nothing so the sandhi does not apply (since n is not followed by a vowel or one of n, m, y or v,) So the joining remains rAmAn.
in the other 21 declined forms, there is no n preceded by r; so the nR sandhi again makes no change to the concatenation of base and ending.

gasyoun commented 6 years ago

Jim, crystal clear as usual. I like https://sanskritstudio.wordpress.com/2014/01/22/sanskrit-internal-sandhi-retroflexion-of-n-to-n/ approach and Macdonnell's scan:

uvr10wcepf0

But why reinventing the wheel? @drdhaval2785, Oliver Hellwig and Huet should have all the coded needed ready.

funderburkjim commented 6 years ago

macdonnel's formulation

A difference from Antoine's description --

intervening letter can 'h' or 'v'.

My algorithm does allow intervening 'h','v' in agreement with Macdonnell.

My current algorithm also allows visarga and anusvara (H, M) among intervening letters, which neither of the Antoine or Macdonell algorithm.

I forgot to mention the h,v,H,M variance from Antoine in my version of nR algorithm.

See next comments for discussion of H,M.

funderburkjim commented 6 years ago

M as intervening letter.

I made a test version of the nR algorithm which excludes M,H as an allowed intervening letter.

Using the 47000+ m_a words as a testbed, this test version gave different declensions for 27 words.

26 of these 27 have an intervening 'M'; interestingly, they all involve taraMga. When 'M' is allowed as intervening letter, the 3s form is taraMgeRa. When 'M' is not allowed as intervening letter, the 3s form is taraMgena.

Which is right?

Using Huet's Sanskrit grammarian program , the declension of taraMga in masculine has the 3s as taraṅgeṇa. His program has replaced the anusvara before 'g' with the homorganic guttural nasal 'N' (ṅ); since gutturals are allowed intervening letters, the 'n' in 'ena' is changed to cerebral 'ṇ'.

Conclusion: The algorithm should allow intervening 'M'.

funderburkjim commented 6 years ago

H as intervening letter.

In the test, only 1 word with an intervening visarga was found: duruHPa

Huet's 3s is duruḥphena so in his algorithm the intervening visarga NOT allowed.

I cannot quote a reason for my inclusion of H; based on the original code, probably somewhere I read that 'H' should be included among gutturals and/or labials.

Because of Huet's example and because I currently have no justification for H, I'll remove 'H' from among the allowed intervening letters in nR sandhi. Thus 'duruHPena' will be 3s form.

SergeA commented 6 years ago

Huet's 3s is duruḥphena so in his algorithm the intervening visarga NOT allowed. I cannot quote a reason for my inclusion of H; based on the original code, probably somewhere I read that 'H' should be included among gutturals and/or labials.

I think you are right in this reasoning. While pronouncing visarga the tip of the tongue does not change its position, so the n>ṇ sandhi should be applicable. In theory. But it is a rare case and I don't know if visarga is mentioned by any authority.

I've tried to learn the rule from translations of Paninian sutras, but I found these sutras are too vague and incomplete. To my great surprise Panini failed even to give the main rule (as it is in the provided picture above). He didn't mention vowels ṛ ṝ as triggers. He didn't mention the following after n letters. In the allowed intervening letters he mentioned (8.4.2): aṭ - a i u ṛ ḷ e o ai au h y v r ku - k kh g gh ṅ pu - p ph b bh m āṅ - prefix ā as in pary-ā-ṇaddham (I didn't catch this) nuṁ - the nasal augment as one changed into anusvara in bṛṁhaṇa (bṛh>bṛnh>bṛṁh) - here commentators explain that for the correct application we should read this 'nuṁ' as if it is written 'anusvara'. The rule functions samāna-pade - in the same word. But here arises the question, what is meant by the 'word' and by the 'same'. Panini give many sutras for compound words with vana, pāna etc. and for different verbal roots.

gasyoun commented 6 years ago

In the test, only 1 word with an intervening visarga was found: duruHPa

That's the best part about Sanskrit NLP - one can actually test. Otherwise, we can write vague rules, but I never was aware that it was a one-word question. So it's a theoretical question more than practical. But an important one.

Bucknell:

78-n-cerebral

funderburkjim commented 6 years ago

These summaries from other sources are material additions to the development of the inflection algorithms. Keep 'em coming! 👍

funderburkjim commented 6 years ago

theoretical question more than practical

I think I disagree. The reason is that in this project the aim is to find inflections of MW headwords.

But sometimes, for a given MW headword, the correct inflection requires that the inflection apply only to the last pada of the word. See discussion of #6.

For example, to solve the practical problem of 'what is the Instrumental singular of akzaramuKa?' we need to take into account that this is a compound akzara-muKa and that the 'r' in first pada akzara does NOT play a role in joining the ending 'ena' to muKa, so the 3s of akzaramuKa is akzaramuKena . By contrast, if we considered the pada to be akzaramuKa, nR sandhi would make the 3s to be akzaramuKeRa. So inflection results depend on preliminary evaluation of what is the ' samāna-pade - in the same word'. Our strategy is to take MW's implied compound structure as the meas of determining the 'pada' to inflect; for this compound structure the algorithm currently uses the hyphenation of 'key2' .

drdhaval2785 commented 6 years ago

text = re.sub('([rfFz][aAiIuUfFxXeEoOhyvrkKgGNpPbBmM]*)[n]', '\g<1>R', text) Translation of Paninian rules in regex.

sanskrit-lexicon / MWinflect