MW feminine form interpretation questions

funderburkjim commented 9 years ago

Although the cases shown below probably are questions of interpretation, rather than errors requiring correction, I'm putting them in this Issue of the Corrections repository since they arise in the same work discussed in the previous issue.

To repeat the opening comment from the previous issue:

I've been reviewing some former work, which takes https://github.com/funderburkjim/MWlexnorm to the next step. Namely, to parse the declension information from MW records, in such a way as to make explicit the declineable stems. This forms a basis for generating declension tables based on the information from MW. I'll detail this step more in the MWlexnorm repository. But in this issue, I'm raising questions that arise in a few cases.

gasyoun commented 9 years ago

Wonder how many feminine words are mentioned in MW overall. Some stats.

funderburkjim commented 9 years ago

In three cases, MW shows , essentially, I and is as the endings of the feminine. In one other case he shows the similar I and iH.

The question is how to interpret these markings when generating a declension for the words in question.

I am thinking of declensions in the way it is presented in modern grammars (as distinguished from the way it is presented in Panini's treatise, which I don't understand). In these modern grammars, the varieties of (regular) declensions is presented by means of models.

For instance, there is a model (such as nadI) for feminine nouns ending in I. In the four cases below, I feel certain that the I designation implies a declension table like that of nadI, but instead of the stem nadI, we use the analogous stem, such as kalI or kavI , etc. So there is no question regarding interpretation of the I designation.

The question more specifically is how to interpret the is (or iH) indications.

There are like two possibilities for the interpretation:

a. Interpret iH as the ending of the nominative singular form, and similarly interpret is as an (to my mind, awkward) alternate way to indicate the same iH ending of the nominative singular form. With this interpretation, the model would be the feminine noun mati, and we would decline kavi like mati, etc. b. Interpret is as implying a model like arcis or ASis; this would imply using stems kalis, kavis, etc.

I think it is likely that option a (like mati) is the right interpretation, but would like confirmation from one of our experts.

Here are the cases:

Case 1: kali scan start of kali , next page, with f. form

Case 2: kavi scan

Case 3: suraBi scan

Case 4: anApi scan

funderburkjim commented 9 years ago

Case 5: atistri scan

This case is superficially like the first four cases. However, the first form I is surely the nom. signular form of the irregular feminine noun strI.

My best guess for how to interpret the is indication is that it implies an alternate declension based on the mati model.

funderburkjim commented 9 years ago

There are 7 cases where a feminine form is marked with Us . My best guess is that in all of these cases the Us means that the nominative singular of the feminine form ends in UH, and that the declension should follow the form of vaDU (young woman), whose nominative singular is vaDUH.

Here are the 7 cases:

Case 1 asitajYu scan

Case 2 kamaRqalu scan

Case 3 kaSeru scan

Case 4 guggulu scan

Case 5 guNgu scan

Case 6 jatu scan

Case 7 tanu scan

tanu also shows a feminine form identified as us, which I interpret as the nom. singular ending, uH, and an implied declension like that of the model Denu.

funderburkjim commented 9 years ago

The final three cases have feminines marked as Is or IH.

My guess is that the interpretation is that the declension model is the feminine noun DI (thought, etc.), whose nominative singular form is DIH.

Here are the cases:

Case 1 adurmaNgala scan

Case 2 aruRI scan

This example is complicated by the presence of the aruRayas form for nom. pl. Based on the DI model, whose nom.pl. is DiyaH, I would expect nom. pl. aruRiyaH . Not sure what to make of this apparent discrepancy. Could there be TWO declensions in the feminine for aruRI, one like nadI except for the nom. pl., and the other like DI ?

Case 3 tantrI scan

zaaf2 commented 9 years ago

Case 5 atistri

I think you are right. Perhaps this helps to clarify:

(https://archive.org/stream/APracticalGrammarOfSanskrit/practical_grammar_monier_williams#page/n97/mode/2up)

zaaf2 commented 9 years ago

Case 7 tanu

In the indication mf(us, ūs, vī)n. I think the f. -ūs refers to the form the word assumes in tanū 2 [L=82369](declined like वधू), since the grammarians mention only tanu or tanvī as alternative forms of the adjective. V. MacDonell §98 c, and Monier-Williams:

(https://archive.org/stream/APracticalGrammarOfSanskrit/practical_grammar_monier_williams#page/n95/mode/2up)

funderburkjim commented 9 years ago

@zaaf2 A comment regarding Github syntax

What we type in the comments of these Github issues is sometimes displayed in a slightly different way. When we want something to be displayed WITHOUT this 'rendering' of the Github markdown, then we can precede and follow that something by a line containing a triple single quote. For instance, your comment had (without rendering):

[L=82369] (declined like वधू),

In this case, the key features are (a) something in square brackets, (b) immediately followed by something in parentheses. This is rendered by Github markdown as a link, with the text in square brackets being shown (in blue); when a user clicks on this, Github interprets the text in parentheses as the target of the link.

Now, in your case, you weren't really intending this to be interpreted as a link.

Two good ways to learn the ins and outs of Github markdown are:

click on the 'Markdown supported' link, which appears when you are entering a comment
Learn from others, by clicking on the 'pencil' of finished comments, which gets you into edit mode and shows you the raw, unrendered text of the comment. In this case, you can click 'cancel' to leave the edit mode without making changes.

funderburkjim commented 9 years ago

Here is the print for tanU , L= 82369

Based on this, which has forms different from both Denu and vaDU, my current guess is that the feminine of tanu should be treated as irregular (for instance, the two accusative forms nvam and nuvam do not follow the Denu or vaDU model.) One possibility would be that there are 3+ declension tables for the f. of tanu:

stem tanvI - normal f. noun ending in long-i (I), model nadI
stem tanu - normal f. noun ending in 'u', model Denu
stem tanU - normal f. noun ending in 'U', model vaDU
Some idiosyncratic forms, as indicated in L=82369, such as tanvam, tanuvam, and others shown.

funderburkjim commented 9 years ago

Regarding compounds with strI - Based on the quote from MW-grammar, I would expect additional adjectives, besides 'atistri', ending in 'stri' (short i); but atistri is (by Advanced search of MW) the only instance! [The other two, aDistri and yaTAstri appear as indeclineables].

Based on the MW-grammar quoted above, I'll mark atistri as irregular.

There are 49 instances of compounds ending in 'strI' (long-I) in MW, all are listed just as 'f.', meaning nouns not adjectives. All of these no doubt follow the irregular declension of 'strI'.

gasyoun commented 9 years ago

49 instances of compounds ending in 'strI' - this is where the reverse dictionary will come handy. @mrudani - do you understand the issues?

funderburkjim commented 9 years ago

The work in this and previous issue has helped in improving the MWlexnorm repository. Thanks for the help and interest.

I may try to add a declension engine to the repository. If anyone wants to follow or help with that work, let me know and I'll add you to the 'research' team there.

zaaf2 commented 9 years ago

Case 1: kali & case 2: kavi

The structure of the articles seems to confirm that the indication between parenthesis, “(is), m.” and “(is, ī), f.”, refers to the nominative case, meaning that the f. can be declined either as mati or as nadī (the m. of course is decl. like agni). E.g.:

@funderburkjim

@zaaf2 A comment regarding Github syntax

I get it now :wink: Thanks

gasyoun commented 9 years ago

I may try to add a declension engine to the repository - you may want to dive into @drdhaval2785 code, as he has developed his own and dug into Huet's lately, so he definately knows a lot about it. I would suggest writing a letter to him.

funderburkjim commented 9 years ago

@gasyoun @drdhaval2785 I had not realized Dhaval had done work on noun declensions;
I thought his work thus far was with verb conjugations.

This weekend I spent some time looking at my Elisp code of 2002-4. I had forgotten how much had been done. This work was essentially a coding of algorithms presented by Antoine's Sanskrit Grammar for High Schools , and Kale's Higher Sanskrit Grammar. It covered both nouns and verbs. I was also using the version of Thomas' MW as an input source. The code consists of about 1300 Lisp functions, and several input tables. It also includes some form-generation (participles of some types, causal stems, infinitives).

My opinion at the moment is that I should

understand this Elisp code
add to the documentation so it is easier to know how to run the engine
Reproduce the inflection tables (nouns and verbs) that were done at that time
At this point, I'll probably upload to a repository on GitHub.

Then,

modify it to work with current much-improved MW
Perhaps, transform Elisp code to Python.
Remove deficiencies (notably, aorist forms I didn't understand in 2002 - maybe I can understand them now).

Somewhere along in this process it would be good to compare to other inflection tables. I wonder if Dhaval's work would lend itself to such a comparison?

I use Huet's Sanskrit Heritage web site often, and know that my work duplicates the functionality of much of his. But since his code is not available for examination, this duplication of effort is unavoidable.

gasyoun commented 9 years ago

Comparison will have to wait, but Dhaval's code is open, I guess. At least we can ask to sync it. Huet has opened his code (partly) lately, that is where Dhaval is playing around for last months.

drdhaval2785 commented 9 years ago

@funderburkjim The code at https://github.com/drdhaval2785/SanskritSubanta is open, and gives reasonably good output. But the code was primarily written to explain step by step derivation of Paninian grammar. So, I will need to refractor it to give only noun form tables.

I have refractored Verb form code to do so. See https://github.com/drdhaval2785/SanskritVerb. All one has to do is to change tiGanta.html line <input type="hidden" name="frontend" value="1" checked> If one wants derivation step by step - keep value="1". If only declention table is neeed - change value="0".

The same may be extended to noun form generation too, but it would take some time.

N.B. - Right now the PHP code is too slow to give whole derivation on the fly. I am trying to figure out the memory leaking function. When I find it out, I would revert back to @funderburkjim to suggest a speedy alternative.

gasyoun commented 9 years ago

@funderburkjim would sure need some noun tables from @drdhaval2785 that's what I understand, soon or not - because he is closest to them at know and already knows the caveats. It's time I install Python and can follow Dhaval's instructions. Jim, I wonder if we can have them uploaded on Cologne's test server, can we?

funderburkjim commented 9 years ago

@drdhaval2785 I'm glad to know about sanskritsubanta, and will definitely use it when I get to comparing stage.

I notice that you have sandhi also. Wonder if it would be of interest to compare to scharfsandhi.

Regarding finding peformance bottlenecks in sanskritverb, it occurs to me that this kind of question is probably addressed by existing PHP tools. Search for PHP performance (testing, profiling, etc.). I've note actually used any, to know what is involved. Such a tool should be able to tell you how many micro/milli-seconds are spent in each routine, thus helping to identify the bottlenecks.

funderburkjim commented 9 years ago

@gasyoun re: I wonder if we can have them uploaded on Cologne's test server.

The test server that I set up long ago so that you and Dhaval could use does not now exist.

However, I have been experimenting with Digitalocean. It seems that I can set up a server there and then arrange to have access to others, such as Dhaval and/or you.

I think most of the data and programming environment of Cologne sanskrit-lexicon can be duplicated there.

What seems like a good idea to me is for there to be a robust sanskrit-lexicon development server, where many of the ideas we have can be explored and experimented with by multiple developers.

Since little use was made of the first test server, I have been unsure whether setting up such a test server on Digitalocean is a good idea.

If you and @drdhaval2785 think I should spend time setting this up, please chime in so I'll know the effort is worthwhile.

drdhaval2785 commented 9 years ago

I have started doing time efficiency analysis for verb forms. It significantly helps. Around 3500 lines of code out of 11000 is optimised.

And for me, I can surely say that I won't be able to actively participate in near future. Sorry for being not of much use.

funderburkjim commented 9 years ago

@drdhaval2785 It is still on my mental 'todo' list to consider 'refactoring' your sanskritverb project, which I hope you continue to be active in. It will be at least several months before I get to that. In the interim, as you work on it, I hope you'll consider adding programming documentation; in my earlier brief work with the code, I found the low level (but critical) specialized string utilities hard to understand.

drdhaval2785 commented 9 years ago

I will comment the code for your and others' information when I release the refractored code maybe next week. And maybe a small working paper for understanding the code.

gasyoun commented 9 years ago

@drdhaval2785 best news you could bring up in a few months.

funderburkjim commented 9 years ago

From Sanskritverb issues, you report terrific speed up! Code explaining ideas encouraging.

I'd be glad to be a guinea-pig (aka, test subject) regarding the code explanations.

gasyoun commented 8 years ago

mw:jala1lu'uddi1n@akbarshah,78286:jalAlu 'ddIn akbar shAh:t:

jala

इन्द्राब्रह्मणस्पति [L=29171] m. du. -> इन्द्राब्रह्मणस्पति [p= 167,1] [L=29171], I, m. du.

indra

कलचुरिसंवत्सर [p= 1324,1] [L=45636.1] September 5, A. D. 348 - A. is misstaged as a link to Apte, it is nothing

क्षरसमाम्नायिक [p= 128,3] -> क्षरसमाम्नायिक [p= 128,2]

gasyoun commented 8 years ago

I would want to see a list of words that end on I: in most dictionaries and rarely on i: like in

qAkini:PE,PUI
qAkinI:AP,AP90,BEN,BUR,CAE,CCS,MD,MW,MW72,PUI,PW,PWG,SHS,SKD,STC,VCP,WIL,YAT

I would vote for qAkini:PE,PUI as print error. @drdhaval2785 any ideas?

sanskrit-lexicon / CORRECTIONS

MW feminine form interpretation questions #126