Open funderburkjim opened 4 years ago
This is essentially just the headwords, along with cologne ID, of krm. Samples:
;; Case 0001: L=1, k1=aka, k2=aka, code=1
;; Case 0002: L=2, k1=aki, k2=aki, code=1
The code is not interesting; it is always '1'. Also k2 is not interesting; by intial construction of krm.txt (the cologne digitization) , k2 is the same as k1. k1 was originally constructed from the DAtupAWa sutra: For instance, k1 is aga and agi from the sutras of these two entries:
This is the hard part: mapping k1 from the krm_verb_filter to an mw root. The mw roots are present in file mwverbs1, which is derived from the Cologne digitization (mw.txt) of the Monier-Williams dictionary.
An attempt is made to do this matching in a systematic way; and to indicate the particular rules involved in each correspondence with a code. For example:
;; Case 0004: L=4, k1=aga, k2=aga, code=3a-NC, mw=ag
;; Case 0005: L=5, k1=agi, k2=agi, code=3nasal-NC, mw=aNg
We see that headword 'aga' of krm is said to correspond to headword 'ag' of MW; the code '3a-NC' indicates that
The headword 'agi' of krm is said to correspond to headword 'aNg' of mw, according to rule
3nasal-NC
which indicates
All of the 'A' parts of the 'A-B' codes used in mapping krm headwords to mw headwords are summarized in the following table.
code | description | count | # other variations |
---|---|---|---|
3 | no drop | 161 | 12 |
3A | drop final A | 14 | 2 |
3I | drop final I | 54 | 1 |
3N | drop final N | 49 | 1 |
3U | drop final U | 22 | 3 |
3Y | drop final Y | 42 | 4 |
3a | drop final a | 1024 | 86 |
3e | drop final e | 14 | 0 |
3f | drop final f | 159 | 8 |
3ir | drop final ir | 25 | 1 |
3nasal | insert nasal | 241 | 7 |
3o | drop final o | 4 | 2 |
3u | drop final u | 109 | 26 |
3x | drop final x | 17 | 2 |
? | unmatched | 15 | 0 |
S | special | 111 | 111 |
There are 15 krm headwords which thus far have NOT been matched to MW verbs.
These are 111 matches which do NOT follow an easily summarized rule.
The A-B codes for most krm to mw mappings have B = NC; i.e., only the 'A' transformation of the krm root was involved. The 'other variations' column shows a count of the cases when B != NC, i.e., there were other changes to the krm spellings.
This table shows counts of the 'B' parts of the 'A-B' codes describing the krm to mw mapping.
code | description | count |
---|---|---|
S | Special | 126 |
NC | no extra change | 1780 |
00 | ^R -> n | 34 |
01 | ^z -> s | 44 |
02 | ^zw -> st | 14 |
03 | ^zW -> sT | 2 |
04 | ^zR -> sn | 7 |
05 | nc -> Yc | 15 |
06 | nj -> Yj | 4 |
07 | nS -> MS | 3 |
08 | ns -> Ms | 2 |
09 | nB -> mB | 8 |
10 | nP -> mP | 6 |
11 | np -> mp | 3 |
12 | cC -> C | 10 |
13 | sj -> jj | 3 |
The 126 count includes
This 'No Change' category is by far the most common. In these cases the mapping required only changes related to the indicatory marker (the 'A' part of the 'A-B' code).
The other cases are simple regex changes used to transform the krm headword. For instance the '00' code indicates that an initial 'R' (retroflex nasal) was changed to 'n' (dental nasal). After this change, one of the 'A' changes was made, and the result was the matching mw verb.
For instance:
;; Case 0691: L=680, k1=RIN, k2=RIN, code=3N-00, mw=nI
The '00' of '3N-00' code means that the krm spelling 'RIN' was first changed to 'nIN'. Then the '3N' of the code means (first table above) that the final 'N' was dropped, yielding 'nI'. This 'nI' is the mw root correpsonding to 'RIN'.
We derive the 'fullroot' and the 'sense' from the DAtupAWa sutra for each entry of krm. This is quite straightforward for most of the sutras; as the 'aga', 'agi' examples (see image above)
;; Case= 0004, L=4, k1=aga, fullroot=aga, sense="kuwilAyAM gatO"
;; Case= 0005, L=5, k1=agi, fullroot=agi, sense="gatO"
In these usual cases, the words of the sutra are separated into 'words' separated by a 'space'; and the first word is the 'fullroot', and the other words (usually just 1) comprise the sense.
In 52 cases, there is a 'premarker' before the root'. For example,
;; Case= 0252, L=246, k1=kfY, fullroot=qu kfY, sense="karaRe"
Here 'qu' is a premarker, and the fullroot is 'qu kfY'. The sense is the last word 'karaRe'.
Note that our matching rules currently make no use of the premarker.
In 3 cases, there are alternate spellings given in the 'fullroot'. Example:
;; Case= 0151, L=146, k1=kawi, fullroot=kawi/kawI, sense="gatO"
Our matching procedure only uses 'k1' (i.e., the first of the alternates).
This file adds the 'mw' and 'code' fields from krm_verb_filter_map to the fullroot and sense fields of krm_sense. For example:
krm_verb_filter_map record for L=1
;; Case 0001: L=1, k1=aka, k2=aka, code=3a-NC, mw=ak
krm_sense record for L=1
;; Case= 0001, L=1, k1=aka, fullroot=aka, sense="kuwilAyAM gatO"
result: krm_sense_map record for L=1
;; Case= 0001, L=1, k1=aka, fullroot=aka, mw=ak, code=3a-NC, sense="kuwilAyAM gatO"
This krm_sense_map data will later be used to 'merge' krm and mdp data.
This finishes the overview of the part of verbs01 pertaining to krm.
See #2 for similar introduction to the 'mdp' part of verbs01.
This issue documents the part of verbs01 relating to the correspondence between Cologne digitization KRM of Kṛdantarūpamālā and mw verb headwords.
In outline, the steps are embodied in the following outputs:
Examples and brief introductions for each of these outputs are in the additional comments below.