Open funderburkjim opened 6 years ago
The endings used for the f_i declension algorithm are:
Case | S | D | P |
---|---|---|---|
Nominative | uH | U | avaH |
Accusative | um | U | UH |
Instrumental | vA | uByAm | uBiH |
Dative | vE/ave | uByAm | uByaH |
Ablative | vAH/oH | uByAm | uByaH |
Genitive | vAH/oH | voH | UnAm |
Locative | vAm/O | voH | uzu |
Vocative | o | U | avaH |
There are two choices for endings in the Dative singular, Ablative singular, Genitive singular, and Locative singular. Our notation for designating alternates is to write the alternates in order, with '/' separating the alternates.
This convention for showing alternates is also seen in the declension tables below.
We assume that the stem (last pada of key2) already ends in 'u'. The base then is formed by removing the final 'u'.
For example, the base for 'Denu' is Den.
The declension algorithm for feminine nouns ending in 'u' is procedurally the same as that for feminine nouns ending in 'i', with the exception of using the f_u endings.
Here is a summary of the procedure.
Note this is a generalization of the usual procedure (such as m_a declension), only taking into account the presence of alternate endings.
Note that nR sandhi has no application in this example.
Case | S | D | P |
---|---|---|---|
Nominative | Den + uH = DenuH | Den + U = DenU | Den + avaH = DenavaH |
Accusative | Den + um = Denum | Den + U = DenU | Den + UH = DenUH |
Instrumental | Den + vA = DenvA | Den + uByAm = DenuByAm | Den + uBiH = DenuBiH |
Dative | Den + vE/ave = DenvE/Denave | Den + uByAm = DenuByAm | Den + uByaH = DenuByaH |
Ablative | Den + vAH/oH = DenvAH/DenoH | Den + uByAm = DenuByAm | Den + uByaH = DenuByaH |
Genitive | Den + vAH/oH = DenvAH/DenoH | Den + voH = DenvoH | Den + UnAm = DenUnAm |
Locative | Den + vAm/O = DenvAm/DenO | Den + voH = DenvoH | Den + uzu = Denuzu |
Vocative | Den + o = Deno | Den + U = DenU | Den + avaH = DenavaH |
Note nR sandhi is effective in the Genitive plural form.
Case | S | D | P |
---|---|---|---|
Nominative | dr + uH = druH | dr + U = drU | dr + avaH = dravaH |
Accusative | dr + um = drum | dr + U = drU | dr + UH = drUH |
Instrumental | dr + vA = drvA | dr + uByAm = druByAm | dr + uBiH = druBiH |
Dative | dr + vE/ave = drvE/drave | dr + uByAm = druByAm | dr + uByaH = druByaH |
Ablative | dr + vAH/oH = drvAH/droH | dr + uByAm = druByAm | dr + uByaH = druByaH |
Genitive | dr + vAH/oH = drvAH/droH | dr + voH = drvoH | dr + UnAm = drUnAm -> drURAm |
Locative | dr + vAm/O = drvAm/drO | dr + voH = drvoH | dr + uzu = druzu |
Vocative | dr + o = dro | dr + U = drU | dr + avaH = dravaH |
I haven't noticed any irregularities in Kale's coverage of feminine nouns ending in 'u'.
However, there are some brief indications of differences from our m_u model mentioned in several entries in MW. For instance
kuru f. (ūs) a princess of the Kuru race, Pāṇ. 4-1, 66 & 176 (cf. kaurava, &c.) [L=52715]
This suggests that the nominative singular (at least) of feminine form of kuru is kurUH
, rather than the kuruH
of our algorithm. Perhaps @drdhaval2785 and/or @SergeA can help interpret Panini so we can alter the declensions as needed. Incidentally, Huet also has kuruH
, same as our current algorithm.
P. 4.1.66. says: when denoting people fem. -u > -ū (like f. kurū, brahmabandhū, jīvabandhū) ; but not after -y- (e.g. not in f. adhvaryuḥ).
denoting people
More correctly - jāti - denoting people by their birth.
Huet also has kuruH, same as our current algorithm
Oh, let me drop him a line.
P. 4.1.66
Well done, @SergeA
I haven't noticed any irregularities
Bucknell has a small list, but @drdhaval2785 noticed his list is 3-4 times bigger.
Also about adjectives: Whitney 344 b. With stems in u the case is quite different. While the feminine may, and in part does, end in u, like the masculine and neuter, a special feminine-stem is often made by lengthening the u to ū, or also by adding ī; and for some stems a feminine is formed in two of these three ways, or even in all the three: thus, kārū, -dipsū́, çundhyū́, cariṣṇū́, vacasyū́; -aṇvī, urvī́, gurvī, pūrvī́ (with a prolongation of u before r: compare 245 b), bahvī́, prabhvī́, raghvī́, sādhvī́, svādvī́;—pṛthú and pṛthvī́, vibhū́ and vibhvī́, mṛdú and mṛdvī́, laghu and laghvī, vásu and vásvī; babhrú and babhrū́, bībhatsú and bībhatsū́, bhīrú and bhīrū;—tanú and tanū́ and tanvī́, phalgú and phalgū́ and phalgvī, mádhu and madhū́ and mádhvī. There are also some feminine noun-stems in ū standing (usually with changed accent) beside masculines in u: thus, ágru m., agrū́ f.; kádru m., kadrū́ f.; gúggulu m., guggulū́ f.; jatu m., jatū́ f.; pṛ́dāku m., pṛdākū́ f. https://en.wikisource.org/wiki/Sanskrit_Grammar/Chapter_V#Nouns_and_Adjectives
What is the goal of current work with inflections? In basic level there are few paradigms, which cover most part of Sanskrit grammar and allow to read Sanskrit texts. But a profound investigation evolves tons of special cases. Only a professional grammarian can follow all those peculiarities and options.
What is the goal of current work with inflections?
The goal is to produce inflections for the dictionary headwords (currently, restricting just to MW dictionary headwords).
My original work with this (done about many years ago, in Elisp), approached the problem as one of making algorithms based on a careful reading of Antoine and Kale. This is the basis of the inflection displays at Cologne web site. I've known for a long time that there are numerous errors within those; despite this I still find the displays quite useful. So the current work initially aims to use similar algorithms, to at least remove errors.
Regarding current treatment of irregularities, I'm mainly deferring, as they are so many and varied. When the obvious algorithms have been completed, I intend to then review once again the irregularities that Kale describes and in some way take those into account. That will be the time when I'll also take into account the comments that @SergeA and others are collecting.
@drdhaval2785 noticed his list is 3-4 times bigger.
What is that larger list?
special feminine-stem is often made by lengthening the u to ū etc
Several of these are mentioned in MW, and handled by the lexnorm data extracted from MW.
For instance 151110 BIru BIru m:f#U:f#u:n
lexnorm data
The interpretation here is that BIru has declensions in all three genders, and that in the feminine gender it may be inflected by either the 'f_u' model (like Denu) or the 'f_U' model (like vaDU).
In the f_u list of this issue, BIru does not occur. Why? Because its lexnorm information is not plain f
.
It will turn up in a later 'f_u' list (as well as in m_u, f_U, and n_u) when the stem_model generation deals with the rest of the headwords ending in 'u'.
Some of the other examples you mention are also properly marked in lexnorm. But some of your examples may not be mentioned by MW, so we'll need to compare your examples to the final forms derived solely from MW.
The goal is to produce inflections for the dictionary headwords (currently, restricting just to MW dictionary headwords).
The goal is too wide and the timing is bad. It's a huge task, practical appliance narrow, work with rare cases - endless
despite this I still find the displays quite useful.
Have you met anyone else who thinks the same?
Some of the other examples you mention are also properly marked in lexnorm.
I'm fascinated with lexnorm
file because it will become one day the basis of my reverse dictionary of Sanskrit.
But the generated wordforms will be dirty. If it's to understand better grammar for yourself - I can understand that. Documenting of it might help someone else as well. But the result will contain hundreds if not thousands of errors. And cleaning them will mean that Cologne dictionary cleanup will come to an end. Because this task is totally in a different direction.
Have you met anyone else who thinks the same?
Yes, I have.
practical appliance narrow,
I disagree. If you are reading a text and come across an inflected word that you can't parse, the
inflected form displays can often get you over the obstacle, and identify the dictionary headword (or headwords) corresponding to your problematic inflected word. For instance, in Hitopadesha (which I'm currently studying), one of the first lines is sarvadravyezu vidyEva dravyamAhuranuttamam
; after
undoing the sandhi, we have sarvadravyezu vidyA eva dravyam AhuH anuttamam
; 'sarvadravyezu' is
clearly a compound 'sarva-dravyezu', and the other words are simple inflected forms, easily related to
dictionary headwords, except AhuH. What the heck is that? The inflected form display
provides an answer:
So AhuH is 3rd plural of one of these verbs, all of which have to do with 'say'. So we could finish translating: Of all things, knowledge indeed is the thing, they say, unsurpassed.
So being able to get to a dictionary entry via an inflected form is of immense value to any non-expert who wants to read a Sanskrit text.
will mean that Cologne dictionary cleanup will come to an end
The inflection generation actually has some feedback into cleaning up the digitization. For instance, I found the erroneous headword 'navaviMSaSati' in mw digitization, which should be 'navaviMSati'. There are several other less serious corrections/changes which have come to light.
Also, the complete correction of inflections need not, indeed cannot, be accomlished in its entirety. So I can attend to other approaches to dictionary correctness and utility once this first phase is past.
The only other web-accessible tool (that I know of) which provides functionality similar to the inflected form lookup is the stemmer function at Huet's site.
In the 'Search for atomic inflected forms' section, you can enter
and often get back useful information (e.g. try gacCati, verb)
However, AhuH, verb is not recognized. So in some ways, at least, the coverage of the inflected form display at Cologne is wider.
Also, you do not know apriori that AhuH is a verb; in fact my first guess would be that it is a noun (like guru).
So, the functionality of knowing inflected forms related to all the MW entries seems a worthy goal, and one which is worth improving.
If you are reading a text and come across an inflected word that you can't parse, the inflected form displays can often get you over the obstacle, and identify the dictionary headword (or headwords) corresponding to your problematic inflected word.
There is Huet for that.
So being able to get to a dictionary entry via an inflected form is of immense value to any non-expert who wants to read a Sanskrit text.
Sure, but to make it work as good as you want to will make all the previous work freeze. That's why I feel bad about it somehow.
some feedback into cleaning up the digitization
That's the best excuse for me.
AhuH, verb is not recognized. So in some ways, at least, the coverage of the inflected form display at Cologne is wider.
Will ask Huet.
So, the functionality of knowing inflected forms related to all the MW entries seems a worthy goal, and one which is worth improving.
But even Huet has failed and he is even longer in the business that you are or will be.
The goal is to produce inflections for the dictionary headwords (currently, restricting just to MW dictionary headwords).
Yes, that's very useful option. But this requires lots of work and profound knowledge of grammar. I'd recommend to use both Kale's and Whitney's grammars. Kale's approach is based on traditional Panini system. Whitney gives a western view (and also explains vedic peculiarities).
However, AhuH, verb is not recognized.
Try āhur. ah is an irregular verb from which we have only few forms of Perfect. It is used as synonym for Present brū, and in Indian tradition it is given as an alternative form from the root brū. However in western view it is not so.
āhur
Excellent! Will keep the r/H ending variation in mind as a way to tease out all the goodness in Gerard's site.
Whitney
Not sure why I don't use Whitney grammar as much, but will try to get to know it better.
feminine nouns ending in 'u'
This list is derived from lexnorm-all2 by the simple filter: a) key1 ends in short vowel 'u' b) lexnorm is precisely 'f'
There are 238distinct such cases, listed in file nominals/inputs/f_u.txt.