stem model f_u and declension

funderburkjim commented 6 years ago

feminine nouns ending in 'u'

This list is derived from lexnorm-all2 by the simple filter: a) key1 ends in short vowel 'u' b) lexnorm is precisely 'f'

There are 238distinct such cases, listed in file nominals/inputs/f_u.txt.

funderburkjim commented 6 years ago

endings for f_u model

The endings used for the f_i declension algorithm are:

Case	S	D	P
Nominative	uH	U	avaH
Accusative	um	U	UH
Instrumental	vA	uByAm	uBiH
Dative	vE/ave	uByAm	uByaH
Ablative	vAH/oH	uByAm	uByaH
Genitive	vAH/oH	voH	UnAm
Locative	vAm/O	voH	uzu
Vocative	o	U	avaH

Alternate endings

There are two choices for endings in the Dative singular, Ablative singular, Genitive singular, and Locative singular. Our notation for designating alternates is to write the alternates in order, with '/' separating the alternates.

This convention for showing alternates is also seen in the declension tables below.

funderburkjim commented 6 years ago

Base for f_u model

We assume that the stem (last pada of key2) already ends in 'u'. The base then is formed by removing the final 'u'.

For example, the base for 'Denu' is Den.

funderburkjim commented 6 years ago

f_u declension algorithm

The declension algorithm for feminine nouns ending in 'u' is procedurally the same as that for feminine nouns ending in 'i', with the exception of using the f_u endings.

Here is a summary of the procedure.

Start with key2.
If there are '-' characters in key2, represent key2 as H-X, where X is the last 'pada' and H is the head, with any additional '-' characters removed. If there are no '-' characters in key2, set H = '' (empty string) and X = key2.
construct the stem from X by removing the last character, which is 'u'. i.e., represent X as Yu, where 'Y' is the base.
for each cell of the 24 declension cells, let E be one of the endings for that cell
- concatenate the base with the ending: Z = YE
- apply nR sandhi to Z, resulting in W.
- concatenate H with W. This is the one of the declensions of key2 for the given cell
- if there is more than one ending for the cell, join the alternates to get all the alternate declensions.

Note this is a generalization of the usual procedure (such as m_a declension), only taking into account the presence of alternate endings.

funderburkjim commented 6 years ago

example Denu

Note that nR sandhi has no application in this example.

Case	S	D	P
Nominative	Den + uH = DenuH	Den + U = DenU	Den + avaH = DenavaH
Accusative	Den + um = Denum	Den + U = DenU	Den + UH = DenUH
Instrumental	Den + vA = DenvA	Den + uByAm = DenuByAm	Den + uBiH = DenuBiH
Dative	Den + vE/ave = DenvE/Denave	Den + uByAm = DenuByAm	Den + uByaH = DenuByaH
Ablative	Den + vAH/oH = DenvAH/DenoH	Den + uByAm = DenuByAm	Den + uByaH = DenuByaH
Genitive	Den + vAH/oH = DenvAH/DenoH	Den + voH = DenvoH	Den + UnAm = DenUnAm
Locative	Den + vAm/O = DenvAm/DenO	Den + voH = DenvoH	Den + uzu = Denuzu
Vocative	Den + o = Deno	Den + U = DenU	Den + avaH = DenavaH

funderburkjim commented 6 years ago

example dru

Note nR sandhi is effective in the Genitive plural form.

Case	S	D	P
Nominative	dr + uH = druH	dr + U = drU	dr + avaH = dravaH
Accusative	dr + um = drum	dr + U = drU	dr + UH = drUH
Instrumental	dr + vA = drvA	dr + uByAm = druByAm	dr + uBiH = druBiH
Dative	dr + vE/ave = drvE/drave	dr + uByAm = druByAm	dr + uByaH = druByaH
Ablative	dr + vAH/oH = drvAH/droH	dr + uByAm = druByAm	dr + uByaH = druByaH
Genitive	dr + vAH/oH = drvAH/droH	dr + voH = drvoH	dr + UnAm = drUnAm -> drURAm
Locative	dr + vAm/O = drvAm/drO	dr + voH = drvoH	dr + uzu = druzu
Vocative	dr + o = dro	dr + U = drU	dr + avaH = dravaH

funderburkjim commented 6 years ago

Irregularities

I haven't noticed any irregularities in Kale's coverage of feminine nouns ending in 'u'.

However, there are some brief indications of differences from our m_u model mentioned in several entries in MW. For instance

kuru f. (ūs) a princess of the Kuru race, Pāṇ. 4-1, 66 & 176  (cf. kaurava, &c.) [L=52715]

This suggests that the nominative singular (at least) of feminine form of kuru is kurUH, rather than the kuruH of our algorithm. Perhaps @drdhaval2785 and/or @SergeA can help interpret Panini so we can alter the declensions as needed. Incidentally, Huet also has kuruH, same as our current algorithm.

SergeA commented 6 years ago

P. 4.1.66. says: when denoting people fem. -u > -ū (like f. kurū, brahmabandhū, jīvabandhū) ; but not after -y- (e.g. not in f. adhvaryuḥ).

SergeA commented 6 years ago

denoting people

More correctly - jāti - denoting people by their birth.

gasyoun commented 6 years ago

Huet also has kuruH, same as our current algorithm

Oh, let me drop him a line.

P. 4.1.66

Well done, @SergeA

I haven't noticed any irregularities

Bucknell has a small list, but @drdhaval2785 noticed his list is 3-4 times bigger.

SergeA commented 6 years ago

Also about adjectives: Whitney 344 b. With stems in u the case is quite different. While the feminine may, and in part does, end in u, like the masculine and neuter, a special feminine-stem is often made by lengthening the u to ū, or also by adding ī; and for some stems a feminine is formed in two of these three ways, or even in all the three: thus, kārū, -dipsū́, çundhyū́, cariṣṇū́, vacasyū́; -aṇvī, urvī́, gurvī, pūrvī́ (with a prolongation of u before r: compare 245 b), bahvī́, prabhvī́, raghvī́, sādhvī́, svādvī́;—pṛthú and pṛthvī́, vibhū́ and vibhvī́, mṛdú and mṛdvī́, laghu and laghvī, vásu and vásvī; babhrú and babhrū́, bībhatsú and bībhatsū́, bhīrú and bhīrū;—tanú and tanū́ and tanvī́, phalgú and phalgū́ and phalgvī, mádhu and madhū́ and mádhvī. There are also some feminine noun-stems in ū standing (usually with changed accent) beside masculines in u: thus, ágru m., agrū́ f.; kádru m., kadrū́ f.; gúggulu m., guggulū́ f.; jatu m., jatū́ f.; pṛ́dāku m., pṛdākū́ f. https://en.wikisource.org/wiki/Sanskrit_Grammar/Chapter_V#Nouns_and_Adjectives

What is the goal of current work with inflections? In basic level there are few paradigms, which cover most part of Sanskrit grammar and allow to read Sanskrit texts. But a profound investigation evolves tons of special cases. Only a professional grammarian can follow all those peculiarities and options.

funderburkjim commented 6 years ago

What is the goal of current work with inflections?

The goal is to produce inflections for the dictionary headwords (currently, restricting just to MW dictionary headwords).

My original work with this (done about many years ago, in Elisp), approached the problem as one of making algorithms based on a careful reading of Antoine and Kale. This is the basis of the inflection displays at Cologne web site. I've known for a long time that there are numerous errors within those; despite this I still find the displays quite useful. So the current work initially aims to use similar algorithms, to at least remove errors.

Regarding current treatment of irregularities, I'm mainly deferring, as they are so many and varied. When the obvious algorithms have been completed, I intend to then review once again the irregularities that Kale describes and in some way take those into account. That will be the time when I'll also take into account the comments that @SergeA and others are collecting.

funderburkjim commented 6 years ago

@drdhaval2785 noticed his list is 3-4 times bigger.

What is that larger list?

funderburkjim commented 6 years ago

special feminine-stem is often made by lengthening the u to ū etc

Several of these are mentioned in MW, and handled by the lexnorm data extracted from MW.

For instance 151110 BIru BIru m:f#U:f#u:n lexnorm data

The interpretation here is that BIru has declensions in all three genders, and that in the feminine gender it may be inflected by either the 'f_u' model (like Denu) or the 'f_U' model (like vaDU).

In the f_u list of this issue, BIru does not occur. Why? Because its lexnorm information is not plain f.

It will turn up in a later 'f_u' list (as well as in m_u, f_U, and n_u) when the stem_model generation deals with the rest of the headwords ending in 'u'.

Some of the other examples you mention are also properly marked in lexnorm. But some of your examples may not be mentioned by MW, so we'll need to compare your examples to the final forms derived solely from MW.

gasyoun commented 6 years ago

The goal is to produce inflections for the dictionary headwords (currently, restricting just to MW dictionary headwords).

The goal is too wide and the timing is bad. It's a huge task, practical appliance narrow, work with rare cases - endless

despite this I still find the displays quite useful.

Have you met anyone else who thinks the same?

Some of the other examples you mention are also properly marked in lexnorm.

I'm fascinated with lexnorm file because it will become one day the basis of my reverse dictionary of Sanskrit. But the generated wordforms will be dirty. If it's to understand better grammar for yourself - I can understand that. Documenting of it might help someone else as well. But the result will contain hundreds if not thousands of errors. And cleaning them will mean that Cologne dictionary cleanup will come to an end. Because this task is totally in a different direction.

funderburkjim commented 6 years ago

Have you met anyone else who thinks the same?

Yes, I have.

funderburkjim commented 6 years ago

practical appliance narrow,

I disagree. If you are reading a text and come across an inflected word that you can't parse, the inflected form displays can often get you over the obstacle, and identify the dictionary headword (or headwords) corresponding to your problematic inflected word. For instance, in Hitopadesha (which I'm currently studying), one of the first lines is sarvadravyezu vidyEva dravyamAhuranuttamam; after undoing the sandhi, we have sarvadravyezu vidyA eva dravyam AhuH anuttamam ; 'sarvadravyezu' is clearly a compound 'sarva-dravyezu', and the other words are simple inflected forms, easily related to dictionary headwords, except AhuH. What the heck is that? The inflected form display provides an answer:

So AhuH is 3rd plural of one of these verbs, all of which have to do with 'say'. So we could finish translating: Of all things, knowledge indeed is the thing, they say, unsurpassed.

So being able to get to a dictionary entry via an inflected form is of immense value to any non-expert who wants to read a Sanskrit text.

funderburkjim commented 6 years ago

will mean that Cologne dictionary cleanup will come to an end

The inflection generation actually has some feedback into cleaning up the digitization. For instance, I found the erroneous headword 'navaviMSaSati' in mw digitization, which should be 'navaviMSati'. There are several other less serious corrections/changes which have come to light.

Also, the complete correction of inflections need not, indeed cannot, be accomlished in its entirety. So I can attend to other approaches to dictionary correctness and utility once this first phase is past.

funderburkjim commented 6 years ago

another remark regarding AhuH.

The only other web-accessible tool (that I know of) which provides functionality similar to the inflected form lookup is the stemmer function at Huet's site.

In the 'Search for atomic inflected forms' section, you can enter

an inflected form
a grammatical category (noun, verb, and several others)

and often get back useful information (e.g. try gacCati, verb)

However, AhuH, verb is not recognized. So in some ways, at least, the coverage of the inflected form display at Cologne is wider.

Also, you do not know apriori that AhuH is a verb; in fact my first guess would be that it is a noun (like guru).

So, the functionality of knowing inflected forms related to all the MW entries seems a worthy goal, and one which is worth improving.

gasyoun commented 6 years ago

If you are reading a text and come across an inflected word that you can't parse, the inflected form displays can often get you over the obstacle, and identify the dictionary headword (or headwords) corresponding to your problematic inflected word.

There is Huet for that.

So being able to get to a dictionary entry via an inflected form is of immense value to any non-expert who wants to read a Sanskrit text.

Sure, but to make it work as good as you want to will make all the previous work freeze. That's why I feel bad about it somehow.

some feedback into cleaning up the digitization

That's the best excuse for me.

AhuH, verb is not recognized. So in some ways, at least, the coverage of the inflected form display at Cologne is wider.

Will ask Huet.

So, the functionality of knowing inflected forms related to all the MW entries seems a worthy goal, and one which is worth improving.

But even Huet has failed and he is even longer in the business that you are or will be.

SergeA commented 6 years ago

The goal is to produce inflections for the dictionary headwords (currently, restricting just to MW dictionary headwords).

Yes, that's very useful option. But this requires lots of work and profound knowledge of grammar. I'd recommend to use both Kale's and Whitney's grammars. Kale's approach is based on traditional Panini system. Whitney gives a western view (and also explains vedic peculiarities).

However, AhuH, verb is not recognized.

Try āhur. ah is an irregular verb from which we have only few forms of Perfect. It is used as synonym for Present brū, and in Indian tradition it is given as an alternative form from the root brū. However in western view it is not so.

funderburkjim commented 6 years ago

āhur

Excellent! Will keep the r/H ending variation in mind as a way to tease out all the goodness in Gerard's site.

Whitney

Not sure why I don't use Whitney grammar as much, but will try to get to know it better.

sanskrit-lexicon / MWinflect