Open funderburkjim opened 4 years ago
The report is organized according to the MD entries identified as verbs; each such entry is considered a 'case':
;; Case 0001: L=159, k1=akz, k2=akz, code=V, #upasargas=1 (1/0), mw=akz (same)
This record provides
mw=?
) where no correspondence currently identified.When there are upasargas for a MD entry, these are grouped below the case. Consider the verb 'tF' (to cross over):
;; Case 0317: L=8757, k1=tF, k2=tF, code=V, #upasargas=14 (12/2), mw=tF (same)
01 ati tF atitF atitF yes ati+tF
02 vyati tF vyatitF vyatitF yes vi+ati+tF
03 ava tF avatF avatF yes ava+tF
04 samava tF samavatF samavatF yes sam+ava+tF
05 A tF AtF AtF yes A+tF
06 ud tF uttF uttF yes ud+tF
07 prod tF prottF prottF yes pra+ud+tF
08 samud tF samuttF samuttF yes sam+ud+tF
09 nis tF nistF nistF yes nis+tF
10 pra tF pratF pratF yes pra+tF
11 vipra tF vipratF vipratF no
12 vi tF vitF vitF yes vi+tF
13 pravi tF pravitF pravitF no
14 sam tF saMtF saMtF yes sam+tF
Note that 'tF' in MD is also the mw spelling. There are 14 upasargas found; 12 have been matched to MW prefixed verbs and 2 (`vipra and pravi``) have not been matched (that is, MD has 'vipra' as upasarga for 'tF', but MW does not have a prefixed verb for 'tF' with prefix vipra; i.e., vipratF is not a prefixed verb in MW.
The listing for upasargas shows:
Currently, 3050 of the upasargas are identified with MW prefixed verb entries (search ' yes') and 380 are not identified with MW prefixed verb entries (search ' no').
Verbs entries in MD are recognized by having an upper-case transliteration following the Devanagari headword. In terms of the Cologne digitization, this is found by the pattern: u'¦[ABCDGHIJKPTUÑĀĪŚŪ̃ḌḤḶḸṂṄṆṚṜṢṬSNMRVLYEO‡-]+[, ]' The ‡ character represents a sandhi-joining symbol, for just a few verbs, such as slp1 'svad' (SU‡AD).
It is possible that there are some verb entries in MD that have been missed by the above pattern matching. However, since 4.8% of the entries are identified as verbs by this pattern, there are probably not many, if any, MD verbs that have been missed.
But still it would be good to do an exclusion analysis for MD to more directly address the completeness of the verb identification.
There is no clear identification of upasargas within verb entries of CCS. Rather, upasargas only appear as bold text, in MD's version of IAST transliteration. But there are other Sanskrit text appearing in bold type (such as different verb forms, participles, etc.) In this scan snippet (from MD verb 'As'), we see several bold text instances, some being upasargas (or compound upasargas) and some being related non-upasarga Sanskrit words.
The approach taken to identify upasargas within verb entries makes use of a list of upasargas in md_upasargas.txt. This list started as the list for CCS dictionary, and several additional compound upasargas were added.
Then, for a given verb entry of MD, all the words in bold type of the entry were examined, and those words appearing in the list of compound upasargas were considered to be the upasargas for that verb entry of MD.
Further, this computed list of upasargas for each entry was manually compared with the underlying text of the MD entry to confirm the list.
The resulting list of upasargas for each verb entry is in file appears in the md_preverb0 file. This file is the basis of the upasargas of the md_preverb1 report.
Each line of this file represents an upasarga or compound upasarga, and there are two forms:
{@-pari‡upa@}
For example, notice the 'pari-upa' example in the image above for verb 'As'.
In md_upasargas we see: pariupa paryupa
, which maps the un-sandhied 'pari-upa' to the
sandhied form 'paryupa'.
But these newborn combinations are not added as alternate headwords, are they?
The verbs01 directory aims
The comments here will focus on the md_preverb1 report.
md_preverb1_deva is a Devanagari version of the report.
Currently, 992 of the 20748 entries of MD are identifed as verbs (4.8%). 618 of these verbs have upasargas, and a total of 3430 upasargas are identified.
All but 6 of the verbs are found to correspond with MW verbs. All but 380 of the upasargas are found to correspond with MW prefixed verbs.