verbs01 - Githubissues

funderburkjim commented 4 years ago

The verbs01 directory aims

to identify the entries in the Macdonell Sanskrit-English Dictionary which are verbs, and
to provide a correspondence between the headwords of these entries and verb entries of the Monier-Williams dictionary.
to identify the verb entries which further have upasargas, and to provide a correspondence between these upasargas and the prefixed verb entries of MW.

The comments here will focus on the md_preverb1 report.
md_preverb1_deva is a Devanagari version of the report.

Currently, 992 of the 20748 entries of MD are identifed as verbs (4.8%). 618 of these verbs have upasargas, and a total of 3430 upasargas are identified.

All but 6 of the verbs are found to correspond with MW verbs. All but 380 of the upasargas are found to correspond with MW prefixed verbs.

funderburkjim commented 4 years ago

The report is organized according to the MD entries identified as verbs; each such entry is considered a 'case':

;; Case 0001: L=159, k1=akz, k2=akz, code=V, #upasargas=1 (1/0), mw=akz (same)

This record provides

L = the Cologne ID
k1 = the primary headword,
k2 = the full headword (usually same as k1)
a code, here always V
the number of upasargas identified within the md entry
- when non-zero, a parenthetical count is givenof those matched to mw and those unmatched to mw.
the MW headword believed to correspond to this entry
- There are 6 cases (mw=?) where no correspondence currently identified.
a 'flag' comparing k1 to mw:
- (same) means the md headword spelling is the same as the spelling of the MW entry believed to correspond to the md entry (920 cases)
- (diff) means the k1 and mw spellings differ (66 cases)
- note 920 + 66 + 6 = 992 (total number of cases of verbs)

funderburkjim commented 4 years ago

preverb

When there are upasargas for a MD entry, these are grouped below the case. Consider the verb 'tF' (to cross over):

;; Case 0317: L=8757, k1=tF, k2=tF, code=V, #upasargas=14 (12/2), mw=tF (same)
01        ati         tF                atitF                atitF yes ati+tF
02      vyati         tF              vyatitF              vyatitF yes vi+ati+tF
03        ava         tF                avatF                avatF yes ava+tF
04     samava         tF             samavatF             samavatF yes sam+ava+tF
05          A         tF                  AtF                  AtF yes A+tF
06         ud         tF                 uttF                 uttF yes ud+tF
07       prod         tF               prottF               prottF yes pra+ud+tF
08      samud         tF              samuttF              samuttF yes sam+ud+tF
09        nis         tF                nistF                nistF yes nis+tF
10        pra         tF                pratF                pratF yes pra+tF
11      vipra         tF              vipratF              vipratF no 
12         vi         tF                 vitF                 vitF yes vi+tF
13      pravi         tF              pravitF              pravitF no 
14        sam         tF                saMtF                saMtF yes sam+tF

Note that 'tF' in MD is also the mw spelling. There are 14 upasargas found; 12 have been matched to MW prefixed verbs and 2 (`vipra and pravi``) have not been matched (that is, MD has 'vipra' as upasarga for 'tF', but MW does not have a prefixed verb for 'tF' with prefix vipra; i.e., vipratF is not a prefixed verb in MW.

The listing for upasargas shows:

xx a sequence number for the upasargas for the verb
the upasarga
the verb
a likely spelling of the prefixed verb obtained by joining the upasarga with k1
a likely spelling of the prefixed verb obtained by joining the upasarga with the mw root spelling
yes/no indicating whether the prefixed verb is found as an entry in MW dictionary
When the prefixed verb is in MW, then a parsing is given of the mw prefixed verb spelling.

Currently, 3050 of the upasargas are identified with MW prefixed verb entries (search ' yes') and 380 are not identified with MW prefixed verb entries (search ' no').

funderburkjim commented 4 years ago

identification of verbs

Verbs entries in MD are recognized by having an upper-case transliteration following the Devanagari headword. In terms of the Cologne digitization, this is found by the pattern: u'¦[ABCDGHIJKPTUÑĀĪŚŪ̃ḌḤḶḸṂṄṆṚṜṢṬSNMRVLYEO‡-]+[, ]' The ‡ character represents a sandhi-joining symbol, for just a few verbs, such as slp1 'svad' (SU‡AD).

It is possible that there are some verb entries in MD that have been missed by the above pattern matching. However, since 4.8% of the entries are identified as verbs by this pattern, there are probably not many, if any, MD verbs that have been missed.

But still it would be good to do an exclusion analysis for MD to more directly address the completeness of the verb identification.

funderburkjim commented 4 years ago

upasarga identification - the problem

There is no clear identification of upasargas within verb entries of CCS. Rather, upasargas only appear as bold text, in MD's version of IAST transliteration. But there are other Sanskrit text appearing in bold type (such as different verb forms, participles, etc.) In this scan snippet (from MD verb 'As'), we see several bold text instances, some being upasargas (or compound upasargas) and some being related non-upasarga Sanskrit words.

funderburkjim commented 4 years ago

upasarga identification - a solution

The approach taken to identify upasargas within verb entries makes use of a list of upasargas in md_upasargas.txt. This list started as the list for CCS dictionary, and several additional compound upasargas were added.

Then, for a given verb entry of MD, all the words in bold type of the entry were examined, and those words appearing in the list of compound upasargas were considered to be the upasargas for that verb entry of MD.

Further, this computed list of upasargas for each entry was manually compared with the underlying text of the MD entry to confirm the list.

The resulting list of upasargas for each verb entry is in file appears in the md_preverb0 file. This file is the basis of the upasargas of the md_preverb1 report.

Format of md_upasargas file.

Each line of this file represents an upasarga or compound upasarga, and there are two forms:

first form is appropriate for matching with the SLP1 transliteration of the bold text of an entry.
- this form is constructed by logic in the the preverb0 program from the digitization form {@-pari‡upa@}
second form is the SLP1 form of the upasarga, after sandhi is applied.

For example, notice the 'pari-upa' example in the image above for verb 'As'. In md_upasargas we see: pariupa paryupa, which maps the un-sandhied 'pari-upa' to the sandhied form 'paryupa'.

gasyoun commented 4 years ago

But these newborn combinations are not added as alternate headwords, are they?

sanskrit-lexicon / MD

verbs01 #1

preverb

identification of verbs

upasarga identification - the problem

upasarga identification - a solution

Format of md_upasargas file.