Open funderburkjim opened 9 years ago
Wonder how many feminine words are mentioned in MW overall. Some stats.
In three cases, MW shows , essentially, I and is as the endings of the feminine. In one other case he shows the similar I and iH.
The question is how to interpret these markings when generating a declension for the words in question.
I am thinking of declensions in the way it is presented in modern grammars (as distinguished from the way it is presented in Panini's treatise, which I don't understand). In these modern grammars, the varieties of (regular) declensions is presented by means of models.
For instance, there is a model (such as nadI) for feminine nouns ending in I. In the four cases below, I feel certain that the I designation implies a declension table like that of nadI, but instead of the stem nadI, we use the analogous stem, such as kalI or kavI , etc. So there is no question regarding interpretation of the I designation.
The question more specifically is how to interpret the is (or iH) indications.
There are like two possibilities for the interpretation:
a. Interpret iH as the ending of the nominative singular form, and similarly interpret is as an (to my mind, awkward) alternate way to indicate the same iH ending of the nominative singular form. With this interpretation, the model would be the feminine noun mati, and we would decline kavi like mati, etc. b. Interpret is as implying a model like arcis or ASis; this would imply using stems kalis, kavis, etc.
I think it is likely that option a (like mati) is the right interpretation, but would like confirmation from one of our experts.
Here are the cases:
Case 1: kali scan start of kali , next page, with f. form
Case 2: kavi scan
Case 3: suraBi scan
Case 4: anApi scan
Case 5: atistri scan
This case is superficially like the first four cases. However, the first form I is surely the nom. signular form of the irregular feminine noun strI.
My best guess for how to interpret the is indication is that it implies an alternate declension based on the mati model.
There are 7 cases where a feminine form is marked with Us . My best guess is that in all of these cases the Us means that the nominative singular of the feminine form ends in UH, and that the declension should follow the form of vaDU (young woman), whose nominative singular is vaDUH.
Here are the 7 cases:
Case 1 asitajYu scan
Case 2 kamaRqalu scan
Case 3 kaSeru scan
Case 4 guggulu scan
Case 5 guNgu scan
Case 6 jatu scan
Case 7 tanu scan
tanu also shows a feminine form identified as us, which I interpret as the nom. singular ending, uH, and an implied declension like that of the model Denu.
The final three cases have feminines marked as Is or IH.
My guess is that the interpretation is that the declension model is the feminine noun DI (thought, etc.), whose nominative singular form is DIH.
Here are the cases:
Case 1 adurmaNgala scan
Case 2 aruRI scan
This example is complicated by the presence of the aruRayas form for nom. pl. Based on the DI model, whose nom.pl. is DiyaH, I would expect nom. pl. aruRiyaH . Not sure what to make of this apparent discrepancy. Could there be TWO declensions in the feminine for aruRI, one like nadI except for the nom. pl., and the other like DI ?
Case 3 tantrI scan
Case 5 atistri
I think you are right. Perhaps this helps to clarify:
Case 7 tanu
In the indication mf(us, ūs, vī)n. I think the f. -ūs refers to the form the word assumes in tanū 2 [L=82369](declined like वधू), since the grammarians mention only tanu or tanvī as alternative forms of the adjective. V. MacDonell §98 c, and Monier-Williams:
@zaaf2 A comment regarding Github syntax
What we type in the comments of these Github issues is sometimes displayed in a slightly different way. When we want something to be displayed WITHOUT this 'rendering' of the Github markdown, then we can precede and follow that something by a line containing a triple single quote. For instance, your comment had (without rendering):
[L=82369] (declined like वधू),
In this case, the key features are (a) something in square brackets, (b) immediately followed by something in parentheses. This is rendered by Github markdown as a link, with the text in square brackets being shown (in blue); when a user clicks on this, Github interprets the text in parentheses as the target of the link.
Now, in your case, you weren't really intending this to be interpreted as a link.
Two good ways to learn the ins and outs of Github markdown are:
Here is the print for tanU , L= 82369
Based on this, which has forms different from both Denu and vaDU, my current guess is that
the feminine of tanu should be treated as irregular (for instance, the two accusative forms nvam
and nuvam
do not follow the Denu or vaDU model.) One possibility would be that there are
3+ declension tables for the f. of tanu:
Regarding compounds with strI - Based on the quote from MW-grammar, I would expect additional adjectives, besides 'atistri', ending in 'stri' (short i); but atistri is (by Advanced search of MW) the only instance! [The other two, aDistri and yaTAstri appear as indeclineables].
Based on the MW-grammar quoted above, I'll mark atistri as irregular.
There are 49 instances of compounds ending in 'strI' (long-I) in MW, all are listed just as 'f.', meaning nouns not adjectives. All of these no doubt follow the irregular declension of 'strI'.
49 instances of compounds ending in 'strI' - this is where the reverse dictionary will come handy. @mrudani - do you understand the issues?
The work in this and previous issue has helped in improving the MWlexnorm repository. Thanks for the help and interest.
I may try to add a declension engine to the repository. If anyone wants to follow or help with that work, let me know and I'll add you to the 'research' team there.
Case 1: kali & case 2: kavi
The structure of the articles seems to confirm that the indication between parenthesis, “(is), m.” and “(is, ī), f.”, refers to the nominative case, meaning that the f. can be declined either as mati or as nadī (the m. of course is decl. like agni). E.g.:
@funderburkjim
@zaaf2 A comment regarding Github syntax
I get it now :wink: Thanks
I may try to add a declension engine to the repository - you may want to dive into @drdhaval2785 code, as he has developed his own and dug into Huet's lately, so he definately knows a lot about it. I would suggest writing a letter to him.
@gasyoun @drdhaval2785 I had not realized Dhaval had done work on noun declensions;
I thought his work thus far was with verb conjugations.
This weekend I spent some time looking at my Elisp code of 2002-4. I had forgotten how much had been done. This work was essentially a coding of algorithms presented by Antoine's Sanskrit Grammar for High Schools , and Kale's Higher Sanskrit Grammar. It covered both nouns and verbs. I was also using the version of Thomas' MW as an input source. The code consists of about 1300 Lisp functions, and several input tables. It also includes some form-generation (participles of some types, causal stems, infinitives).
My opinion at the moment is that I should
Then,
Somewhere along in this process it would be good to compare to other inflection tables. I wonder if Dhaval's work would lend itself to such a comparison?
I use Huet's Sanskrit Heritage web site often, and know that my work duplicates the functionality of much of his. But since his code is not available for examination, this duplication of effort is unavoidable.
Comparison will have to wait, but Dhaval's code is open, I guess. At least we can ask to sync it. Huet has opened his code (partly) lately, that is where Dhaval is playing around for last months.
@funderburkjim The code at https://github.com/drdhaval2785/SanskritSubanta is open, and gives reasonably good output. But the code was primarily written to explain step by step derivation of Paninian grammar. So, I will need to refractor it to give only noun form tables.
I have refractored Verb form code to do so. See https://github.com/drdhaval2785/SanskritVerb.
All one has to do is to change tiGanta.html line <input type="hidden" name="frontend" value="1" checked>
If one wants derivation step by step - keep value="1".
If only declention table is neeed - change value="0".
The same may be extended to noun form generation too, but it would take some time.
N.B. - Right now the PHP code is too slow to give whole derivation on the fly. I am trying to figure out the memory leaking function. When I find it out, I would revert back to @funderburkjim to suggest a speedy alternative.
@funderburkjim would sure need some noun tables from @drdhaval2785 that's what I understand, soon or not - because he is closest to them at know and already knows the caveats. It's time I install Python and can follow Dhaval's instructions. Jim, I wonder if we can have them uploaded on Cologne's test server, can we?
@drdhaval2785 I'm glad to know about sanskritsubanta, and will definitely use it when I get to comparing stage.
I notice that you have sandhi also. Wonder if it would be of interest to compare to scharfsandhi.
Regarding finding peformance bottlenecks in sanskritverb, it occurs to me that this kind of question is probably addressed by existing PHP tools. Search for PHP performance (testing, profiling, etc.). I've note actually used any, to know what is involved. Such a tool should be able to tell you how many micro/milli-seconds are spent in each routine, thus helping to identify the bottlenecks.
@gasyoun re: I wonder if we can have them uploaded on Cologne's test server
.
The test server that I set up long ago so that you and Dhaval could use does not now exist.
However, I have been experimenting with Digitalocean. It seems that I can set up a server there and then arrange to have access to others, such as Dhaval and/or you.
I think most of the data and programming environment of Cologne sanskrit-lexicon can be duplicated there.
What seems like a good idea to me is for there to be a robust sanskrit-lexicon development server, where many of the ideas we have can be explored and experimented with by multiple developers.
Since little use was made of the first test server, I have been unsure whether setting up such a test server on Digitalocean is a good idea.
If you and @drdhaval2785 think I should spend time setting this up, please chime in so I'll know the effort is worthwhile.
I have started doing time efficiency analysis for verb forms. It significantly helps. Around 3500 lines of code out of 11000 is optimised.
And for me, I can surely say that I won't be able to actively participate in near future. Sorry for being not of much use.
@drdhaval2785 It is still on my mental 'todo' list to consider 'refactoring' your sanskritverb project, which I hope you continue to be active in. It will be at least several months before I get to that. In the interim, as you work on it, I hope you'll consider adding programming documentation; in my earlier brief work with the code, I found the low level (but critical) specialized string utilities hard to understand.
I will comment the code for your and others' information when I release the refractored code maybe next week. And maybe a small working paper for understanding the code.
@drdhaval2785 best news you could bring up in a few months.
From Sanskritverb issues, you report terrific speed up! Code explaining ideas encouraging.
I'd be glad to be a guinea-pig (aka, test subject) regarding the code explanations.
mw:jala1lu'uddi1n@akbarshah,78286:jalAlu 'ddIn akbar shAh:t:
इन्द्राब्रह्मणस्पति [L=29171] m. du. -> इन्द्राब्रह्मणस्पति [p= 167,1] [L=29171], I, m. du.
कलचुरिसंवत्सर [p= 1324,1] [L=45636.1] September 5, A. D. 348 - A. is misstaged as a link to Apte, it is nothing
क्षरसमाम्नायिक [p= 128,3] -> क्षरसमाम्नायिक [p= 128,2]
I would want to see a list of words that end on I: in most dictionaries and rarely on i: like in
qAkini:PE,PUI
qAkinI:AP,AP90,BEN,BUR,CAE,CCS,MD,MW,MW72,PUI,PW,PWG,SHS,SKD,STC,VCP,WIL,YAT
I would vote for qAkini:PE,PUI
as print error. @drdhaval2785 any ideas?
Although the cases shown below probably are questions of interpretation, rather than errors requiring correction, I'm putting them in this Issue of the Corrections repository since they arise in the same work discussed in the previous issue.
To repeat the opening comment from the previous issue:
I've been reviewing some former work, which takes https://github.com/funderburkjim/MWlexnorm to the next step. Namely, to parse the declension information from MW records, in such a way as to make explicit the declineable stems. This forms a basis for generating declension tables based on the information from MW. I'll detail this step more in the MWlexnorm repository. But in this issue, I'm raising questions that arise in a few cases.