Open funderburkjim opened 6 years ago
The endings used by me for the m_a model are:
Case | S | D | P |
---|---|---|---|
Nominative | aH | O | AH |
Accusative | am | O | An |
Instrumental | ena | AByAm | EH |
Dative | Aya | AByAm | eByaH |
Ablative | At | AByAm | eByaH |
Genitive | asya | ayoH | AnAm |
Locative | e | ayoH | ezu |
Vocative | a | O | AH |
Our declensions start with the headword spellings provided by Monier Williams. For masculine nouns ending in 'a', these spellings all end in 'a' : such as 'kUpa', 'rAma'.
The declension algorithm constructs the base by removing the final 'a'. So, the base for 'kUpa' is 'kUp', the base for 'rAma' is 'rAm', etc.
For kUpa, the joining is the simplest possible: string concatenation of base and ending.
For the Nominative singular, kUp
joined to aH
gives kUpaH
.
Using '+' to represent string concatenation, we can explain the declension table of kUpa
:
Case | S | D | P |
---|---|---|---|
Nominative | kUp + aH = kUpaH | kUp + O = kUpO | kUp + AH = kUpAH |
Accusative | kUp + am = kUpam | kUp + O = kUpO | kUp + An = kUpAn |
Instrumental | kUp + ena = kUpena | kUp + AByAm = kUpAByAm | kUp + EH = kUpEH |
Dative | kUp + Aya = kUpAya | kUp + AByAm = kUpAByAm | kUp + eByaH = kUpeByaH |
Ablative | kUp + At = kUpAt | kUp + AByAm = kUpAByAm | kUp + eByaH = kUpeByaH |
Genitive | kUp + asya = kUpasya | kUp + ayoH = kUpayoH | kUp + AnAm = kUpAnAm |
Locative | kUp + e = kUpe | kUp + ayoH = kUpayoH | kUp + ezu = kUpezu |
Vocative | kUp + a = kUpa | kUp + O = kUpO | kUp + AH = kUpAH |
If we concatenate the base rAm
to the m_a endings, we get this table, which is wrong in the
3s and 8p (Instrumental singular and Genitive plural).
Case | S | D | P |
---|---|---|---|
Nominative | rAm + aH = rAmaH | rAm + O = rAmO | rAm + AH = rAmAH |
Accusative | rAm + am = rAmam | rAm + O = rAmO | rAm + An = rAmAn |
Instrumental | rAm + ena = |
rAm + AByAm = rAmAByAm | rAm + EH = rAmEH |
Dative | rAm + Aya = rAmAya | rAm + AByAm = rAmAByAm | rAm + eByaH = rAmeByaH |
Ablative | rAm + At = rAmAt | rAm + AByAm = rAmAByAm | rAm + eByaH = rAmeByaH |
Genitive | rAm + asya = rAmasya | rAm + ayoH = rAmayoH | rAm + AnAm = |
Locative | rAm + e = rAme | rAm + ayoH = rAmayoH | rAm + ezu = rAmezu |
Vocative | rAm + a = rAma | rAm + O = rAmO | rAm + AH = rAmAH |
Joining the base to the ending by simple string concatenation gives the incorrect results in 3s and 7p for rAma declension. We must apply a sandhi rule to change the dental nasal 'n' of these two endings to the cerebral nasal 'R'. So joining is a two-step process:
The details of the nR sandhi are such that for the other endings besides 3s and 7s, the rule makes no change in the result.
Thus, in all cells of the declension, we arrive at the correct result by applying two steps in the joining process:
Using an arrow '->' to represent this two-step joining process, we can describe the correct declension algorithm as:
Case | S | D | P |
---|---|---|---|
Nominative | rAm + aH -> rAmaH | rAm + O -> rAmO | rAm + AH -> rAmAH |
Accusative | rAm + am -> rAmam | rAm + O -> rAmO | rAm + An -> rAmAn |
Instrumental | rAm + ena -> rAmeRa | rAm + AByAm -> rAmAByAm | rAm + EH -> rAmEH |
Dative | rAm + Aya -> rAmAya | rAm + AByAm -> rAmAByAm | rAm + eByaH -> rAmeByaH |
Ablative | rAm + At -> rAmAt | rAm + AByAm -> rAmAByAm | rAm + eByaH -> rAmeByaH |
Genitive | rAm + asya -> rAmasya | rAm + ayoH -> rAmayoH | rAm + AnAm -> rAmARAm |
Locative | rAm + e -> rAme | rAm + ayoH -> rAmayoH | rAm + ezu -> rAmezu |
Vocative | rAm + a -> rAma | rAm + O -> rAmO | rAm + AH -> rAmAH |
When, in the same word, n
is preceded by f
, F
, r
, or z
and followed by
a vowel or one of n
, m
, y
or v
, it is changed to R
provided
the intervening letters be not palatals (c C j J Y
), cerebrals (w W q Q R
),
dentals (t T d D n
), or one of the three letters s
, l
, or S
.
This is one of the most complicated sandhi rules that come to mind.
Applying this rule to the concatenated form rAmena
, we have
n
preceded by r
: rAmenan
followed by a vowel : rAmenar
and n
are Ame
: rAmenas,l,S
.Similarly, rAmAnAm becomes rAmARAm.
Among the other 22 other declined forms,
n
preceded by r
, but that n
is followed by nothing so the
sandhi does not apply (since n
is not followed by a vowel or one of n
, m
, y
or v
,)
So the joining remains rAmAn.n
preceded by r
; so the nR sandhi again makes no
change to the concatenation of base and ending.Jim, crystal clear as usual. I like https://sanskritstudio.wordpress.com/2014/01/22/sanskrit-internal-sandhi-retroflexion-of-n-to-n/ approach and Macdonnell's scan:
But why reinventing the wheel? @drdhaval2785, Oliver Hellwig and Huet should have all the coded needed ready.
macdonnel's formulation
A difference from Antoine's description --
My algorithm does allow intervening 'h','v' in agreement with Macdonnell.
My current algorithm also allows visarga and anusvara (H, M) among intervening letters, which neither of the Antoine or Macdonell algorithm.
I forgot to mention the h,v,H,M variance from Antoine in my version of nR algorithm.
See next comments for discussion of H,M.
I made a test version of the nR algorithm which excludes M,H as an allowed intervening letter.
Using the 47000+ m_a words as a testbed, this test version gave different declensions for 27 words.
26 of these 27 have an intervening 'M'; interestingly, they all involve taraMga
.
When 'M' is allowed as intervening letter, the 3s form is taraMgeRa
.
When 'M' is not allowed as intervening letter, the 3s form is taraMgena
.
Which is right?
Using Huet's Sanskrit grammarian program , the declension of taraMga
in masculine has the
3s as taraṅgeṇa
. His program has replaced the anusvara before 'g' with the homorganic guttural
nasal 'N' (ṅ); since gutturals are allowed intervening letters, the 'n' in 'ena' is changed to cerebral 'ṇ'.
Conclusion: The algorithm should allow intervening 'M'.
In the test, only 1 word with an intervening visarga was found: duruHPa
Huet's 3s is duruḥphena
so in his algorithm the intervening visarga NOT allowed.
I cannot quote a reason for my inclusion of H; based on the original code, probably somewhere I read that 'H' should be included among gutturals and/or labials.
Because of Huet's example and because I currently have no justification for H, I'll remove 'H' from among the allowed intervening letters in nR sandhi. Thus 'duruHPena' will be 3s form.
Huet's 3s is duruḥphena so in his algorithm the intervening visarga NOT allowed. I cannot quote a reason for my inclusion of H; based on the original code, probably somewhere I read that 'H' should be included among gutturals and/or labials.
I think you are right in this reasoning. While pronouncing visarga the tip of the tongue does not change its position, so the n>ṇ sandhi should be applicable. In theory. But it is a rare case and I don't know if visarga is mentioned by any authority.
I've tried to learn the rule from translations of Paninian sutras, but I found these sutras are too vague and incomplete. To my great surprise Panini failed even to give the main rule (as it is in the provided picture above). He didn't mention vowels ṛ ṝ as triggers. He didn't mention the following after n letters. In the allowed intervening letters he mentioned (8.4.2): aṭ - a i u ṛ ḷ e o ai au h y v r ku - k kh g gh ṅ pu - p ph b bh m āṅ - prefix ā as in pary-ā-ṇaddham (I didn't catch this) nuṁ - the nasal augment as one changed into anusvara in bṛṁhaṇa (bṛh>bṛnh>bṛṁh) - here commentators explain that for the correct application we should read this 'nuṁ' as if it is written 'anusvara'. The rule functions samāna-pade - in the same word. But here arises the question, what is meant by the 'word' and by the 'same'. Panini give many sutras for compound words with vana, pāna etc. and for different verbal roots.
In the test, only 1 word with an intervening visarga was found: duruHPa
That's the best part about Sanskrit NLP - one can actually test. Otherwise, we can write vague rules, but I never was aware that it was a one-word question. So it's a theoretical question more than practical. But an important one.
Bucknell:
These summaries from other sources are material additions to the development of the inflection algorithms. Keep 'em coming! 👍
theoretical question more than practical
I think I disagree. The reason is that in this project the aim is to find inflections of MW headwords.
But sometimes, for a given MW headword, the correct inflection requires that the inflection apply only to the last pada of the word. See discussion of #6.
For example, to solve the practical problem of 'what is the Instrumental singular of akzaramuKa?' we need to take into account that this is a compound akzara-muKa and that the 'r' in first pada akzara does NOT play a role in joining the ending 'ena' to muKa, so the 3s of akzaramuKa is akzaramuKena . By contrast, if we considered the pada to be akzaramuKa, nR sandhi would make the 3s to be akzaramuKeRa. So inflection results depend on preliminary evaluation of what is the ' samāna-pade - in the same word'. Our strategy is to take MW's implied compound structure as the meas of determining the 'pada' to inflect; for this compound structure the algorithm currently uses the hyphenation of 'key2' .
text = re.sub('([rfFz][aAiIuUfFxXeEoOhyvrkKgGNpPbBmM]*)[n]', '\g<1>R', text)
Translation of Paninian rules in regex.
"A beginning is a very delicate time. " ... Dune, by Frank Herbert
Since the declension algorithm for masculine nouns ending in 'a' is the first , it makes sense to spend a lot of time discussing the details.
Kale begins his discussion of declension with:
There are three key parts here: