verbs01 - Githubissues

funderburkjim commented 4 years ago

The verbs01 directory aims

to identify the entries in the Grassman Sanskrit-German dictionary which are verbs, and
to provide a correspondence between the headwords of these entries and verb entries of the Monier-Williams dictionary.
to identify the verb entries which further have upasargas, and to provide a correspondence between these and the prefixed verb entries of MW.

The comments here will focus on the gra_preverb1 report. gra_preverb1_deva is a Devanagari version of the report.

Currently, 907 of the 10777 entries of Grassman are identifed as verbs. 461 of these verbs have upasargas, and a total of 2026 upasargas (simple or compound) are identified.

funderburkjim commented 4 years ago

The gra_preverb1 report is organized according to the GRA entries identified as verbs; each such entry is considered a 'case'. There are 907 cases corresponding to the 907 GRA root entries.

;; Case 0006: L=132, k1=aNg, k2=aNg, code=V, #upasargas=0, mw=aNg (same)
;; Case 0007: L=144, k1=ac, k2=ac, code=V, #upasargas=2 (1/1), mw=ac (same)
01        apa         ac                 apAc                 apAc yes apa+ac
02        sam         ac                samac                samac no

This record provides

L = the Cologne ID
k1 = the primary headword,
k2 = the full headword (usually same as k1)
a code, here always V
the number of upasargas identified within the entry
the MW headword believed to correspond to this entry
- There are 6 cases (mw=?) where no correspondence currently identified.
a 'flag' comparing k1 to mw:
- (same) means the headword spelling is the same as the spelling of the MW entry believed to correspond to the entry (705 cases)
- (diff) means the k1 and mw spellings differ.(367 cases)

preverb subentries

When upasargas are recognized within a verb entry, there are additional lines to the report. There is an extra line for each upasarga.

;; Case 0007: L=144, k1=ac, k2=ac, code=V, #upasargas=2 (1/1), mw=ac (same)
01        apa         ac                 apAc                 apAc yes apa+ac
02        sam         ac                samac                samac no

In each of these lines, there is

a sequence number
the upasarga
a copy of the parent verb (k1)
a computed preverb, making use of the parent verb
a computed preverb, making use of the corresponding MW verb spelling (here the same)
indication of whether the computed MW preverb spelling is found as a verb headword in MW
- yes : the preverb is an entry headword in MW (e.g. apAc)
- in this case, the parse of the MW preverb (e.g., apa+ac)
- no : the preverb is not an entry headword in MW (e.g. samac)

An example where GRA root spelling corresponds to a different MW root spelling:

;; Case 0020: L=954, k1=ar, k2=ar_f, code=V, #upasargas=14 (13/1), mw=f (diff)
01        anu         ar                anvar                 anvf yes anu+f

Note:

Root headword 'ar' in GRA is said to correspond to headword 'f' in MW
- 221 of the root spellings in GRA are said to correspond to an MW root spelling which is different.
- 662 of the root spellings in GRA are same as MW spelling
- 24 of the roots in GRA have no MW correspondent
- 221 + 662 + 24 = 907 (total number of GRA roots identified)
The hypothetical preverb spelling is anvar when using the GRA root spelling 'ar'
The hypothetical preverb spelling is anvf when using the MW root spelling 'f'

funderburkjim commented 4 years ago

How root entries are identified

The first line of each entry is examined for several EXCLUSION patterns. For each of the 10K entries of GRA, the first line of the digitized text is examined for one of these patterns. If the pattern matches, the entry is asserted to be a non-verb: The first few patterns are obvious. Later patterns are non-obvious, and were developed by manual examination.

"¦ [mfna][.][, ]",  # 2792 remain
"¦ adv[.][, ]",  # 2783 remain
"¦, [mfna][.][, ]",  # 2296 remain
"-.*?¦",  # 2013 remain   (hyphenated words
"[aá][,.})@]+¦",  # 1482 remain
"tás[,.@})]+¦",  # 1469 remain
"vát[,.@})]+¦",  # 1433 remain
"ṣas[,.@})]+¦",  # 1431 remain
"śás[,.@})]+¦",  # 1420 remain
"¦ *[0-9]+[)] [mfna][.][, ]",  # 1412 remain
"Ablativ",  # 1407 remain
"ā́t[,.@})]+¦",  # 1398 remain
"tas[,.@})]+¦",  # 1388 remain
"á.[,.@})]+¦",  # 1307 remain  penultimate accented 'a'.
"āt[,.@})]+¦",  # 1298 remain  
"@[})]+¦[, ]*{@[^@]+@} *[mfna][.]",  # 1212 remain more substantives
"[áa]thā.*¦",  # 1199 remain
"é.*¦",  # 1189 remain
"[áíú].*¦",  # 1051 remain accented vowel in headword
"fem[.]",  # 1036 remain
"ā́.*¦.*Instr.",  # 1029 remain
"a[^@]+ā́[,.@})]+¦",  # 1001 remain
"īm[,.@})]+¦", # 995 
"vat[,.@})]+¦",

This leaves 995 non-excluded records as verb candidates.
A few additional records (gra_verb_exclude.txt) are excluded. This list developed by manual examination.
The result of all exclusions leaves 907 entries believed to be verbs.

funderburkjim commented 4 years ago

upasarga pattern in print

Examination of scanned entries suggests that upasargas appear as bold text: (root=aYj)

Of course, there is also some bold text which does not represent an upasarga.

funderburkjim commented 4 years ago

upasarga identification in digitization

Thomas Malten's original digitization had markup for:

bold text {@xxx@}
indented divisions. In the current digitization, these are coded in GRA as <div n="P"> or <div n="P1">.

Here is an extract from the digitization for aYj:

<L>208<pc>0023<k1>aYj<k2>aYj
-  Grundbedeutung „sch   --> 
+ Mit {@abhí,@} {%schm   --> abhí,
+ {@ā́ @}1) die Bahn [   --> ā́ 
+ {@ní,@} {%hinuntersc   --> ní,
+ {@prá,@} jemandem [D   --> prá,
+ {@práti,@} {%schmück   --> práti,
+ {@ví,@} med. 1) {%si   --> ví,
+ {@sám@} 1) womit [I.   --> sám
- -nákti 7) vām 153,2    -->

The file 'preverb0_dbg.txt' contains similar extract information for all the verbs.

From this, all the various bold text fragments for all roots were collected There were about 1000 different fragments.
these fragments were examined and the fragments believed to be NON-upasargas were commented out.
For the remaining (about 400) fragments, a corresponding SLP1 spelling of GRA's IAST text was prepared.

The result 'gra_upasarga_map.txt' provides a mapping from bold upasarga text to SLP1 upasarga spellings. This mapping can then be used to generate a list of upasargas for each verb. This is in file gra_preverb0.txt. The line for 'aYj' is: ;; Case 0010: L=208, k1=aYj, #upasargas=7, upasargas=aBi,A,ni,pra,prati,vi,sam

funderburkjim commented 4 years ago

preverb1 construction

Finally, the two pieces:

list of verbs with mapping to mw (gra_verb_filter_map.txt)
list of upasargas for each verb (gra_preverb0.txt)

can be combined to produce gra_preverb1.txt.

gasyoun commented 4 years ago

(same) means the cae headword spelling is the same as the spelling of the MW entry believed to correspond to the cae entry (705 cases)

Hope cae is a copy-paste byproduct.

An example where GRA root spelling corresponds to a different MW root spelling:

GRA in his preface explains what conventions with dhatus he uses. Remember, we even made an English draft translation for the German Preface?

examined for several EXCLUSION patterns

Adorable intellectual work behind the patterns.

scanned entries suggests that upasargas appear as bold text

In original books it's a different font as well, not only bolded.

This mapping can then be used to generate a list of upasargas for each verb. ;; Case 0010: L=208, k1=aYj, #upasargas=7, upasargas=aBi,A,ni,pra,prati,vi,sam

Please advise me where to find a list of upasarga & dhatus combined only? pure list, seems gra_preverb1.txt will be the one. Thanks again, as I use GRA a lot.

funderburkjim commented 4 years ago

Hope cae is a copy-paste byproduct.

Yes, now corrected.

Andhrabharati commented 1 year ago

examined for several EXCLUSION patterns

Adorable intellectual work behind the patterns.

I've adopted a different approach altogether (and marked the entries with a √ symbol), and the summary is as below--

AB extra (L-entry): 614, 1055, 2103, 2436, 3501, 4000, 4285, 4288, ++6376.1, 6528, 6675.1, 7590, 7952, ++8788.1, 8803.1, 8954, 8955, 9602, 10336 [19] [Some of these are present in gra_verb_exclude.txt by Jim.]

Jim extra (L-entry): 1841, 2273, 2429, 3379, 4893, 5081, 5423, 7341, 7735.1, 7947, 8937, 9415, 10238 [13] --------------------------- Next, I had reviwed the Jim's "extra" entries as found above, and noted the following--

Non-verbals:

1841: is in no way a verbal entry. 2273: could this be a stem as in MW - "4. e"? however, like many other "Deutestamm" entries, this also doesn't appear to be a verbal entry. 2429: the corresp. entry in MW shows it as an indeclinable! 4893: is in no way a verbal entry. **Verbals:** 3379: has cf. to a root, and thus a possible verbal entry. 5081: has cf. to a verbal entry, thus a possible verbal entry. 5423: is an inf. form of a root, thus a possible verbal entry. 7947: is a verbal noun of vah. 8937: has cf. to a root, and thus a possible verbal entry. 9415: has cf. to a root, and thus a possible verbal entry. **Artificial ones:** 7341: this doesn't appear to be a verbal entry; however could be an artificial entry, making the listed words!! 7735.1: this doesn't appear to be a verbal entry; however could be an artificial entry, making the listed words!! 10238: this doesn't appear to be a verbal entry; however could be an artificial entry, making the listed words!! In AB's "extra" entries, a particular one needs spl. attention, having the **m.** expansion as `m.` [there are more such places having such expansions in the whole text!] ``` 10550114artart {@(√art),@} ¦ m. anu „werben um“ tā́m ánvārtiṣye sákhibhir návagvais AV. {14,1,56}. Davon anvartitṛ́. ``` [Incidentally, PWG also has this **art** entry as a dhAtu (Whitney has listed this as an artificial root), and is one of the entries that do not find a place in MW!!]

sanskrit-lexicon / GRA

verbs01 #11

preverb subentries

An example where GRA root spelling corresponds to a different MW root spelling:

How root entries are identified

upasarga pattern in print

upasarga identification in digitization

preverb1 construction