KRM-MW - Githubissues

funderburkjim commented 4 years ago

This issue documents the part of verbs01 relating to the correspondence between Cologne digitization KRM of Kṛdantarūpamālā and mw verb headwords.

In outline, the steps are embodied in the following outputs:

krm_verb_filter simple listing of headwords of krm.
krm_verb_filter_map mapping of krm headwords to mw headwords. Includes a two-part code A-B indicating how the krm spelling was made to correspond to the mw spelling. There are also two statistical summaries counting the number of instances for each part of this code:
- krm_verb_filter_codes relating the to 'A' part of the code
- krm_verb_filter_variants relating the to 'B' part of the code
krm_sense Each entry of krm provides a DAtupAWa sutra. Such a sutra consists of a root part and a sense part. krm_sense provides a simple parsing of these krm sutras.
krm_sense_map adds to krm_sense the mw mapping of krm_verb_filter_map.

Examples and brief introductions for each of these outputs are in the additional comments below.

funderburkjim commented 4 years ago

krm_verb_filter

This is essentially just the headwords, along with cologne ID, of krm. Samples:

;; Case 0001: L=1, k1=aka, k2=aka, code=1
;; Case 0002: L=2, k1=aki, k2=aki, code=1

The code is not interesting; it is always '1'. Also k2 is not interesting; by intial construction of krm.txt (the cologne digitization) , k2 is the same as k1. k1 was originally constructed from the DAtupAWa sutra: For instance, k1 is aga and agi from the sutras of these two entries:

funderburkjim commented 4 years ago

krm_verb_filter_map

This is the hard part: mapping k1 from the krm_verb_filter to an mw root. The mw roots are present in file mwverbs1, which is derived from the Cologne digitization (mw.txt) of the Monier-Williams dictionary.

An attempt is made to do this matching in a systematic way; and to indicate the particular rules involved in each correspondence with a code. For example:

;; Case 0004: L=4, k1=aga, k2=aga, code=3a-NC, mw=ag
;; Case 0005: L=5, k1=agi, k2=agi, code=3nasal-NC, mw=aNg

We see that headword 'aga' of krm is said to correspond to headword 'ag' of MW; the code '3a-NC' indicates that

'3a' means that the final 'a' of 'aga' is dropped (leaving 'ag')
NC means that no other change to this krm headword was made.

The headword 'agi' of krm is said to correspond to headword 'aNg' of mw, according to rule 3nasal-NC which indicates

the final 'i' of 'agi' is used to indicate that
- the 'i' of 'agi' is dropped, leaving 'ag' and
- a nasal is to be inserted before the final consonant cluster (here 'g') of 'ag'; the nasal inserted is homorganic to 'g' (i.e., it is guttural nasal 'N' since 'g' is guttural'). Final result is 'aNg'
NC indicates no other change in 'agi' was involved in the mapping.

funderburkjim commented 4 years ago

krm_verb_filter_codes

All of the 'A' parts of the 'A-B' codes used in mapping krm headwords to mw headwords are summarized in the following table.

code	description	count	# other variations
3	no drop	161	12
3A	drop final A	14	2
3I	drop final I	54	1
3N	drop final N	49	1
3U	drop final U	22	3
3Y	drop final Y	42	4
3a	drop final a	1024	86
3e	drop final e	14	0
3f	drop final f	159	8
3ir	drop final ir	25	1
3nasal	insert nasal	241	7
3o	drop final o	4	2
3u	drop final u	109	26
3x	drop final x	17	2
?	unmatched	15	0
S	special	111	111

unmatched

There are 15 krm headwords which thus far have NOT been matched to MW verbs.

special

These are 111 matches which do NOT follow an easily summarized rule.

other variations

The A-B codes for most krm to mw mappings have B = NC; i.e., only the 'A' transformation of the krm root was involved. The 'other variations' column shows a count of the cases when B != NC, i.e., there were other changes to the krm spellings.

funderburkjim commented 4 years ago

krm_verb_filter_variants

This table shows counts of the 'B' parts of the 'A-B' codes describing the krm to mw mapping.

code	description	count
S	Special	126
NC	no extra change	1780
00	^R -> n	34
01	^z -> s	44
02	^zw -> st	14
03	^zW -> sT	2
04	^zR -> sn	7
05	nc -> Yc	15
06	nj -> Yj	4
07	nS -> MS	3
08	ns -> Ms	2
09	nB -> mB	8
10	nP -> mP	6
11	np -> mp	3
12	cC -> C	10
13	sj -> jj	3

special

The 126 count includes

the 111 cases where a krm headword was mapped to an mw root
the 15 cases where a krm headword was NOT yet mapped to an mw root.

NC

This 'No Change' category is by far the most common. In these cases the mapping required only changes related to the indicatory marker (the 'A' part of the 'A-B' code).

regex changes

The other cases are simple regex changes used to transform the krm headword. For instance the '00' code indicates that an initial 'R' (retroflex nasal) was changed to 'n' (dental nasal). After this change, one of the 'A' changes was made, and the result was the matching mw verb.

For instance:

;; Case 0691: L=680, k1=RIN, k2=RIN, code=3N-00, mw=nI

The '00' of '3N-00' code means that the krm spelling 'RIN' was first changed to 'nIN'. Then the '3N' of the code means (first table above) that the final 'N' was dropped, yielding 'nI'. This 'nI' is the mw root correpsonding to 'RIN'.

funderburkjim commented 4 years ago

krm_sense

We derive the 'fullroot' and the 'sense' from the DAtupAWa sutra for each entry of krm. This is quite straightforward for most of the sutras; as the 'aga', 'agi' examples (see image above)

;; Case= 0004, L=4, k1=aga, fullroot=aga, sense="kuwilAyAM gatO"
;; Case= 0005, L=5, k1=agi, fullroot=agi, sense="gatO"

In these usual cases, the words of the sutra are separated into 'words' separated by a 'space'; and the first word is the 'fullroot', and the other words (usually just 1) comprise the sense.

premarker

In 52 cases, there is a 'premarker' before the root'. For example,

;; Case= 0252, L=246, k1=kfY, fullroot=qu kfY, sense="karaRe"

Here 'qu' is a premarker, and the fullroot is 'qu kfY'. The sense is the last word 'karaRe'.

Note that our matching rules currently make no use of the premarker.

multiple roots

In 3 cases, there are alternate spellings given in the 'fullroot'. Example:

;; Case= 0151, L=146, k1=kawi, fullroot=kawi/kawI, sense="gatO"

Our matching procedure only uses 'k1' (i.e., the first of the alternates).

funderburkjim commented 4 years ago

krm_sense_map

This file adds the 'mw' and 'code' fields from krm_verb_filter_map to the fullroot and sense fields of krm_sense. For example:

krm_verb_filter_map record for L=1
;; Case 0001: L=1, k1=aka, k2=aka, code=3a-NC, mw=ak

krm_sense record for L=1
;; Case= 0001, L=1, k1=aka, fullroot=aka, sense="kuwilAyAM gatO"

result:  krm_sense_map record for L=1
;; Case= 0001, L=1, k1=aka, fullroot=aka, mw=ak, code=3a-NC, sense="kuwilAyAM gatO"

This krm_sense_map data will later be used to 'merge' krm and mdp data.

This finishes the overview of the part of verbs01 pertaining to krm.

See #2 for similar introduction to the 'mdp' part of verbs01.

sanskrit-lexicon / KRM

KRM-MW #1

krm_verb_filter

krm_verb_filter_map

krm_verb_filter_codes

unmatched

special

other variations

krm_verb_filter_variants

special

NC

regex changes

krm_sense

premarker

multiple roots

krm_sense_map