PWGpreverb: pfx spelling involving nR sandhi

funderburkjim commented 7 years ago

In examining cases where preverb1a prefixed verbs may be constructed incorrectly (from pfx + root), it was noticed that some prefixes themselves might be misspelled.

Recall that the prefixes are extracted from the digitization pwg.txt , and appear as the third field in the records of preverb1a.txt. For example:

7257:as:parini:parinyas:15034   parini is the prefix

By the 'nR' sandhi, this should be spelled (SLP1) as pariRi.

All the prefixes in prefix1a were examined in this way:

python test_nR_pfx.py preverb1a.txt test_nR_pfx.txt
 with summary results:
8644 records from preverb1a.txt
29 cases written to test_nR_pfx.txt
parini 6 pariRi (24)
parinis 11 pariRis (0)
paryanu 4 paryaRu (0)
prani 7 praRi (24)
pravinis 1 praviRis (0)

In the above, the first line is read as: parini occurs as a prefix in 6 (of the 8644) cases; its sandhi form, paraRi occurs in 24 of the cases.

Given the fact that the correct form (for two cases at least, the prefixes parini and prani) also appears in the dictionary, it is reasonable to suspect that the 29 cases are errors of some kind (print or typo, we don't know).

However, the conclusion that these are errors is perhaps premature, for some as yet unknown reason. For, in examining only the parini+as example, PWG shows:

Also, MW has the same example (copied from PWG?):

परि-न्य्-अस्त [p= 596] : mfn. ( √2. अस्) stretched out, extended Kathās.

gasyoun commented 7 years ago

reasonable to suspect that the 29 cases are errors of some kind

@drdhaval2785 ?

(copied from PWG?

As usual. Remember Zgusta's article?

drdhaval2785 commented 7 years ago

There are at least two ways in which n->R change can be evaded.

क्षुभ्नादिषु च rule is an open ended exception to the rule of nR. Whatever is seen in शिष्टप्रयोगः can be justified by this. Otherwise क्षुभ्णाति , प्राप्णोति should happen.

In वाचस्पत्यम् this is seen in {@परिनन्दन@}¦ त्रि॰ परि + नन्द--णिच्--ल्यु क्षुभ्रा॰ न णत्वम्।

The word प्र / परि are not treate as upasarga, but separate words (first member of compound)

See {@प्रना(णा)यक@}¦ त्रि॰ प्रकृष्टो नायकोऽस्य प्रशब्दस्य नयतिं प्रतिउपसर्गत्वाभावात् न णत्वम्।

drdhaval2785 commented 7 years ago

@funderburkjim There is a super specific rule which one of my teachers guided me to नेर्गदनदपतपदघुमास्यतिहन्तियातिवातिद्रातिप्साति- वपतिवहतिशाम्यतिचिनोतिदेग्धिषु च॥ ८।४।१७

Therefore the 'prani'/''praini' gets converted to 'praRi'/'pariRi' only before these verbs. For other verbs, upasargas are treated as a separate 'pada'. Therefore 'r' and 'n' are not in the same 'pada' . This is a prerequisite for 'n'->'R' conversion.

गद प्रणिगदति। परिणिगदति। नद प्रणिनदति। परिणिनदति। पत प्रणिपतति। परिणिपतति। पद प्रणिपद्यते। परिणिपद्यते। घु प्रणिददाति। परिणिददाति। प्रणिदधाति। परिणिदधाति। माङ् प्रणिमिमीते। परिणिमिमीते। मेङ् प्रणिमयते। परिणिमयते। मा इति मङ्मेङोर् ग्रहणम् इष्यते। स्यति प्रणिष्यति। परिणिष्यति। हन्ति प्रणिहन्ति। परिणिहन्ति। याति प्रणियाति। परिणियाति। वाति प्रणिवाति। परिणिवाति। द्राति प्रणिद्राति। परिणिद्राति। प्साति प्रणिप्साति। परिणिप्साति। वपति प्रणिवपति। परिणिवपति। वहति प्रणिवहति। परिणिवहति। शाम्यति प्रणिशाम्यति। परिणिशाम्यति। चिनोति प्रणिचिनोति। प्रिणिचिनोति। देग्धि प्रणिदेग्धि। परिणिदेग्धि। अड्व्यवाये ऽपि नेर्गदादिषु णत्वम् इष्यते। प्रण्यगदत्। परिण्यगदत्।

http://sanskritdocuments.org/learning_tools/ashtadhyayi/vyakhya/8/8.4.17.htm

drdhaval2785 commented 7 years ago

http://sanskritdocuments.org/learning_tools/ashtadhyayi/vyakhya/8/8.4.14.htm उपसर्गादसमासेऽपि णोपदेशस्य॥ is also relevant here.

I propose - There is too much grammar and not much to be gained by studying it. Let us keep the 'parini'/'prani'/'pariRi'/'praRi' as given in PWG upasarga. No need to apply regexes to upasarga part of it.

gasyoun commented 7 years ago

The word प्र / परि are not treate as upasarga, but separate words (first member of compound)

That's interesting. From IE linguistics pra, pari are preverbs. And not just separate words.

There is too much grammar and not much to be gained by studying it.

So good to know I'm not the only one who thinks so.

funderburkjim commented 7 years ago

preverb1a has been revised in various ways to be consistent with MW spelling conventions. As a side effect, it appears that the special rules for prani, parini -> praRi, paraRi shown above are consistent for the spellings shown in preverb1a.txt.

This concordance (between an obscure grammar rule and the spellings in MW) leads me to think that a good motivating principle for further examination of the preverb1a spellings is to see what adjustments need to be made to the preverb1a spellings to bring them into full compliance with MW spellings.

funderburkjim commented 7 years ago

preverb1b.txt, recall has as its aim to align the spellings of preverb1a to those of MW. Since the preverb1a root spellings come from PWG, the rules that are appropriate in preverb1b.py are those which specify systematic differences between PWG spellings and MW spellings. As currently written, this adjustment to PWG spellings also includes differences in usage of anusvara and homorganic nasals.

As of now, of our 8644 cases, 6763 are matched with MW; these are in preverb1b_mw.txt. And there are 1881 cases that do not match; these are in preverb1b_notmw.txt. The matching has been improved by about 400 cases since the first edition of preverb1b.

Here are the main principles of adjustment used by preverb1b.py to generate matches:

adjustment of nasals
replace PWG's guna form 'ar' with 'f' in root. example 'kar' in PWG <-> 'kf' in MW
Similarly, PWG root 'kalp' corresponds to 'kxp' in MW
PWG roots that end in 'A' may be spelled with 'E', 'e', or 'o' in MW
A special list of root correspondences is also employed: ''' PWG:MW 'jramB':'jfmB', 'cUrRay':'cUrR', 'graB':'grah', 'caraRy':'caraRya', 'cihnay':'cihnaya', 'guRay':'guR', 'gaRay':'gaR', 'gopAy':'gopAya', 'DUnay':'DUnaya', 'DUpay':'DUpaya', 'nyUNKay':'nyUNKaya', 'pAlay':'pAl', 'pASay':'pASaya', 'manasy':'manasya', 'mantray':'mantr', 'mfgay':'mfg', 'mokzay':'mokz', 'arTay':'arT', 'kaTay':'kaT', 'KaRqay':'KaRq', 'yantray':'yantr', 'rUkzay':'rUkz', 'yucC':'yuC', 'rUpay':'rUp', 'lakzay':'lakz', 'liNgay':'liNg', 'varRay':'varR', 'vAjay':'vAjaya', 'vAsay':'vAs', 'viGnay':'viGnaya', 'vfzAy':'vfzAya', 'vraRay':'vraR', 'Sabday':'Sabd', 'SIlay':'SIl', 'Sravasy':'Sravasya', 'SlakzRay':'SlakzRaya', 'Slokay':'Slokaya', 'saMDay':'saMDaya', 'sapary':'saparya', 'saBAjay':'saBAj', 'sAntvay':'sAntv', 'sUcay':'sUc', 'sUtray':'sUtr', 'syad':'syand', 'aYcay':'aYc', 'daSasy':'daSasya', 'miSray':'miSr', 'mUtray':'mUtraya', 'mudray':'mudraya', 'asUy':'asUya', 'kIrtay':'kIrt', '''

funderburkjim commented 7 years ago

What remains to be done.

There are still resolvable differences in the spellings of MW prefixed roots and the implied spellings of preverb1a.

The place to look for candidates for resolution is in preverb1b_notmw.txt.

There may be a few additional cases of special correspondences that can be done by adding to the list shown above for preverb1b.

But many of these have to do with odd sandhi rules for combining prefixes ending in 'i', 'u', and 'nis' with roots spelled with a sibilant. Preverb1a already has added several such rules to the default rules of scharfSandhi, but there remain other special cases. The prefixed forms of root 'sic' is one good example. In these cases, once the situation is resolved, its likely solution would be by addition of an additional rule to the 'sandhi' method of class 'PreverbSandhi' in preverb1a.py.

The reason for trying to complete the matching with MW is twofold:

We learn special correspondences between PWG spelling and MW spelling. This can feed into an improved 'multi-dictionary' index (of which the current hwnorm1 may be thought of as a precursor)
We identify prefixed headwords that are unique to PWG (at least, those that aren't present in MW).

Random note; AP dictionary also presents prefixed verbs as separate headwords, like MW does.

funderburkjim commented 7 years ago

There are a small number of cases which are not prefixed verbs. Here are candidates noted thus far, from preverb1.txt

17753:kunT:kunT:37381
103889:saMjYita:aBisaMjYita:224031
103889:saMjYita:AsaMjYita:224033
22587:guRay:anuguRita:48143
22587:guRay:praguRita:48147
73754:cit:savi:158291   This is typo error; should be pravi
28717:tap:paScAt:61883  ? not sure
15071:kar:is:31616   ? 'is' is vedic variant for nis. Should we include?
113665:stu:nizwavan:244638
114612:smi:ku:246981  ?
61307:mUl:samunmUlay:131841
61307:mUl:nirmUlay:131843
9940:i:aByastam:20607
22170:gA:aByastam:47197
67140:i:pla:144000
23656:graB:saha:50446
21814:gam:saha:46358
-- Avis a prefix ? a gati?--
7256:as:Avis:14978
36484:DA:Avis:78709
55166:BU:Avis:118598

gasyoun commented 7 years ago

Avis a prefix

Never heard of it. But it can be used as a prefixoid.

gasyoun commented 7 years ago

preverb1a spellings is to see what adjustments need to be made to the preverb1a spellings to bring them into full compliance with MW spellings.

Indeed, but not too much of a burden?

adjustment of nasals

Documented where? At least a copypaste of the code would do, thanks.

'jramB':'jfmB'

Where a rule would fit, why a list is used?

funderburkjim commented 7 years ago

not too much of a burden

It is a burden as there is an initial set of 1900 or so cases, and I am not volunteering to do this, at least not now. However, it seems important to do if we plan to have these PWG implied spellings as headwords.

Where a rule would fit, why a list is used?

I prefer lists unless I readily see the appropriate scope for a rule. The problem with applying a rule willy-nilly is that there may be unintended collateral damage - you might solve one problem but introduce another. Also, it is not clear to me in this what the rule might be in this case (jramB:jfmB) .

gasyoun commented 7 years ago

PWG implied spellings as headwords.

Can't we have them in sanhw1.txt right now? Want to verify every generated word?

funderburkjim commented 7 years ago

Well, its a choice, isn't it? Verify now or verify later. It bothers me to put something in the Cologne displays prematurely. However, I realize that sometimes over-emphasis on perfection can inhibit progress. If this comment sounds wishy-washy, its probably because it is.

gasyoun commented 7 years ago

over-emphasis on perfection can inhibit progress

Yes, I'm no more a fan of perfection in matters of Sanskrit. 95% will do for me.

funderburkjim commented 7 years ago

I propose to use preverb1a.txt as the basis of additional headwords for PWG.

For skd, we used the code alt for the alternate headwords added to skd. (see #9).

We need to choose a code for these additional PWG headwords. Maybe the (longish) code preverb would be easy to remember.

The procedure used will be that used for skd, namely, strategy 3 per #9.

gasyoun commented 7 years ago

Maybe the (longish) code preverb would be easy to remember.

So be it, because and upasarga is not always equal to preverb.

funderburkjim commented 7 years ago

Since some of the cases are not prefixed verbs in the sense of upasargas + verb, do you think some other code than 'preverb' would be preferable?

One suggestion might be 'embed' which would apply to a word (other than an alternate) occurring in the body of an entry. Such a generic code as 'embed' could also apply to the STC words which we plan to add as headwords.

gasyoun commented 7 years ago

do you think some other code than 'preverb' would be preferable?

We can call it a prefixoid.

Such a generic code as 'embed' could also apply to the STC words which we plan to add as headwords.

Seems to be too generic, if you ask me. But if you feel comfortable, let it be. Not that crucial.

sanskrit-lexicon / alternateheadwords

PWGpreverb: pfx spelling involving nR sandhi #15

What remains to be done.