sanskrit-lexicon / KRM

Research on Cologne digitization of Kṛdantarūpamālā
GNU General Public License v3.0
1 stars 0 forks source link

KRM-MW #1

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

This issue documents the part of verbs01 relating to the correspondence between Cologne digitization KRM of Kṛdantarūpamālā and mw verb headwords.

In outline, the steps are embodied in the following outputs:

Examples and brief introductions for each of these outputs are in the additional comments below.

funderburkjim commented 4 years ago

krm_verb_filter

This is essentially just the headwords, along with cologne ID, of krm. Samples:

;; Case 0001: L=1, k1=aka, k2=aka, code=1
;; Case 0002: L=2, k1=aki, k2=aki, code=1

The code is not interesting; it is always '1'. Also k2 is not interesting; by intial construction of krm.txt (the cologne digitization) , k2 is the same as k1. k1 was originally constructed from the DAtupAWa sutra: For instance, k1 is aga and agi from the sutras of these two entries: image

funderburkjim commented 4 years ago

krm_verb_filter_map

This is the hard part: mapping k1 from the krm_verb_filter to an mw root. The mw roots are present in file mwverbs1, which is derived from the Cologne digitization (mw.txt) of the Monier-Williams dictionary.

An attempt is made to do this matching in a systematic way; and to indicate the particular rules involved in each correspondence with a code. For example:

;; Case 0004: L=4, k1=aga, k2=aga, code=3a-NC, mw=ag
;; Case 0005: L=5, k1=agi, k2=agi, code=3nasal-NC, mw=aNg

We see that headword 'aga' of krm is said to correspond to headword 'ag' of MW; the code '3a-NC' indicates that

The headword 'agi' of krm is said to correspond to headword 'aNg' of mw, according to rule 3nasal-NC which indicates

funderburkjim commented 4 years ago

krm_verb_filter_codes

All of the 'A' parts of the 'A-B' codes used in mapping krm headwords to mw headwords are summarized in the following table.

code description count # other variations
3 no drop 161 12
3A drop final A 14 2
3I drop final I 54 1
3N drop final N 49 1
3U drop final U 22 3
3Y drop final Y 42 4
3a drop final a 1024 86
3e drop final e 14 0
3f drop final f 159 8
3ir drop final ir 25 1
3nasal insert nasal 241 7
3o drop final o 4 2
3u drop final u 109 26
3x drop final x 17 2
? unmatched 15 0
S special 111 111

unmatched

There are 15 krm headwords which thus far have NOT been matched to MW verbs.

special

These are 111 matches which do NOT follow an easily summarized rule.

other variations

The A-B codes for most krm to mw mappings have B = NC; i.e., only the 'A' transformation of the krm root was involved. The 'other variations' column shows a count of the cases when B != NC, i.e., there were other changes to the krm spellings.

funderburkjim commented 4 years ago

krm_verb_filter_variants

This table shows counts of the 'B' parts of the 'A-B' codes describing the krm to mw mapping.

code description count
S Special 126
NC no extra change 1780
00 ^R -> n 34
01 ^z -> s 44
02 ^zw -> st 14
03 ^zW -> sT 2
04 ^zR -> sn 7
05 nc -> Yc 15
06 nj -> Yj 4
07 nS -> MS 3
08 ns -> Ms 2
09 nB -> mB 8
10 nP -> mP 6
11 np -> mp 3
12 cC -> C 10
13 sj -> jj 3

special

The 126 count includes

NC

This 'No Change' category is by far the most common. In these cases the mapping required only changes related to the indicatory marker (the 'A' part of the 'A-B' code).

regex changes

The other cases are simple regex changes used to transform the krm headword. For instance the '00' code indicates that an initial 'R' (retroflex nasal) was changed to 'n' (dental nasal). After this change, one of the 'A' changes was made, and the result was the matching mw verb.

For instance:

;; Case 0691: L=680, k1=RIN, k2=RIN, code=3N-00, mw=nI

The '00' of '3N-00' code means that the krm spelling 'RIN' was first changed to 'nIN'. Then the '3N' of the code means (first table above) that the final 'N' was dropped, yielding 'nI'. This 'nI' is the mw root correpsonding to 'RIN'.

funderburkjim commented 4 years ago

krm_sense

We derive the 'fullroot' and the 'sense' from the DAtupAWa sutra for each entry of krm. This is quite straightforward for most of the sutras; as the 'aga', 'agi' examples (see image above)

;; Case= 0004, L=4, k1=aga, fullroot=aga, sense="kuwilAyAM gatO"
;; Case= 0005, L=5, k1=agi, fullroot=agi, sense="gatO"

In these usual cases, the words of the sutra are separated into 'words' separated by a 'space'; and the first word is the 'fullroot', and the other words (usually just 1) comprise the sense.

premarker

In 52 cases, there is a 'premarker' before the root'. For example,

;; Case= 0252, L=246, k1=kfY, fullroot=qu kfY, sense="karaRe"

image

Here 'qu' is a premarker, and the fullroot is 'qu kfY'. The sense is the last word 'karaRe'.

Note that our matching rules currently make no use of the premarker.

multiple roots

In 3 cases, there are alternate spellings given in the 'fullroot'. Example:

;; Case= 0151, L=146, k1=kawi, fullroot=kawi/kawI, sense="gatO"

image

Our matching procedure only uses 'k1' (i.e., the first of the alternates).

funderburkjim commented 4 years ago

krm_sense_map

This file adds the 'mw' and 'code' fields from krm_verb_filter_map to the fullroot and sense fields of krm_sense. For example:

krm_verb_filter_map record for L=1
;; Case 0001: L=1, k1=aka, k2=aka, code=3a-NC, mw=ak

krm_sense record for L=1
;; Case= 0001, L=1, k1=aka, fullroot=aka, sense="kuwilAyAM gatO"

result:  krm_sense_map record for L=1
;; Case= 0001, L=1, k1=aka, fullroot=aka, mw=ak, code=3a-NC, sense="kuwilAyAM gatO"

This krm_sense_map data will later be used to 'merge' krm and mdp data.

This finishes the overview of the part of verbs01 pertaining to krm.

See #2 for similar introduction to the 'mdp' part of verbs01.