sanskrit-lexicon / MD

Research re Macdonell Sanskrit-English Dictionary
0 stars 0 forks source link

verbs01 #1

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

The verbs01 directory aims

The comments here will focus on the md_preverb1 report.
md_preverb1_deva is a Devanagari version of the report.

Currently, 992 of the 20748 entries of MD are identifed as verbs (4.8%). 618 of these verbs have upasargas, and a total of 3430 upasargas are identified.

All but 6 of the verbs are found to correspond with MW verbs. All but 380 of the upasargas are found to correspond with MW prefixed verbs.

funderburkjim commented 4 years ago

The report is organized according to the MD entries identified as verbs; each such entry is considered a 'case':

;; Case 0001: L=159, k1=akz, k2=akz, code=V, #upasargas=1 (1/0), mw=akz (same)

This record provides

funderburkjim commented 4 years ago

preverb

When there are upasargas for a MD entry, these are grouped below the case. Consider the verb 'tF' (to cross over):

;; Case 0317: L=8757, k1=tF, k2=tF, code=V, #upasargas=14 (12/2), mw=tF (same)
01        ati         tF                atitF                atitF yes ati+tF
02      vyati         tF              vyatitF              vyatitF yes vi+ati+tF
03        ava         tF                avatF                avatF yes ava+tF
04     samava         tF             samavatF             samavatF yes sam+ava+tF
05          A         tF                  AtF                  AtF yes A+tF
06         ud         tF                 uttF                 uttF yes ud+tF
07       prod         tF               prottF               prottF yes pra+ud+tF
08      samud         tF              samuttF              samuttF yes sam+ud+tF
09        nis         tF                nistF                nistF yes nis+tF
10        pra         tF                pratF                pratF yes pra+tF
11      vipra         tF              vipratF              vipratF no 
12         vi         tF                 vitF                 vitF yes vi+tF
13      pravi         tF              pravitF              pravitF no 
14        sam         tF                saMtF                saMtF yes sam+tF

Note that 'tF' in MD is also the mw spelling. There are 14 upasargas found; 12 have been matched to MW prefixed verbs and 2 (`vipra and pravi``) have not been matched (that is, MD has 'vipra' as upasarga for 'tF', but MW does not have a prefixed verb for 'tF' with prefix vipra; i.e., vipratF is not a prefixed verb in MW.

The listing for upasargas shows:

Currently, 3050 of the upasargas are identified with MW prefixed verb entries (search ' yes') and 380 are not identified with MW prefixed verb entries (search ' no').

funderburkjim commented 4 years ago

identification of verbs

Verbs entries in MD are recognized by having an upper-case transliteration following the Devanagari headword. In terms of the Cologne digitization, this is found by the pattern: u'¦[ABCDGHIJKPTUÑĀĪŚŪ̃ḌḤḶḸṂṄṆṚṜṢṬSNMRVLYEO‡-]+[, ]' The ‡ character represents a sandhi-joining symbol, for just a few verbs, such as slp1 'svad' (SU‡AD).

It is possible that there are some verb entries in MD that have been missed by the above pattern matching. However, since 4.8% of the entries are identified as verbs by this pattern, there are probably not many, if any, MD verbs that have been missed.

But still it would be good to do an exclusion analysis for MD to more directly address the completeness of the verb identification.

funderburkjim commented 4 years ago

upasarga identification - the problem

There is no clear identification of upasargas within verb entries of CCS. Rather, upasargas only appear as bold text, in MD's version of IAST transliteration. But there are other Sanskrit text appearing in bold type (such as different verb forms, participles, etc.) In this scan snippet (from MD verb 'As'), we see several bold text instances, some being upasargas (or compound upasargas) and some being related non-upasarga Sanskrit words.

image

funderburkjim commented 4 years ago

upasarga identification - a solution

The approach taken to identify upasargas within verb entries makes use of a list of upasargas in md_upasargas.txt. This list started as the list for CCS dictionary, and several additional compound upasargas were added.

Then, for a given verb entry of MD, all the words in bold type of the entry were examined, and those words appearing in the list of compound upasargas were considered to be the upasargas for that verb entry of MD.

Further, this computed list of upasargas for each entry was manually compared with the underlying text of the MD entry to confirm the list.

The resulting list of upasargas for each verb entry is in file appears in the md_preverb0 file. This file is the basis of the upasargas of the md_preverb1 report.

Format of md_upasargas file.

Each line of this file represents an upasarga or compound upasarga, and there are two forms:

For example, notice the 'pari-upa' example in the image above for verb 'As'. In md_upasargas we see: pariupa paryupa, which maps the un-sandhied 'pari-upa' to the sandhied form 'paryupa'.

gasyoun commented 4 years ago

But these newborn combinations are not added as alternate headwords, are they?