sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.
13 stars 9 forks source link

KRM: several changes #205

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

These changes made while comparing KRM headwords to the verbs of MW.

One is a print change -- poor quality print. Noted in krm_printchange file of csl-corrections.

; Case 001: L=146, k1=kawi, #changes=1
; 'gatO' -> ' gatO' (space, for consistency)
old: (146) {@<s>“kawi[kawI]gatO”</s>@}¦ (I<s>-BvAdiH</s>-320. <s>saka</s>. <s>sew</s>. <s>para</s>.)
new: (146) {@<s>“kawi[kawI] gatO”</s>@}¦ (I<s>-BvAdiH</s>-320. <s>saka</s>. <s>sew</s>. <s>para</s>.)
; -----------------------------------------------
; Case 002: L=619, k1=ja, #changes=2
; ja -> jF  
old: <L>619<pc>0585<k1>ja<k2>ja
new: <L>619<pc>0585<k1>jF<k2>jF
old: (619) {@<s>“ja vayohAnO”</s>@}¦ (X<s>-curAdiH</s>-1815. <s>aka</s>. <s>sew</s>. <s>uBa</s>. <s>ADfzIyaH .</s>)
new: (619) {@<s>“jF vayohAnO”</s>@}¦ (X<s>-curAdiH</s>-1815. <s>aka</s>. <s>sew</s>. <s>uBa</s>. <s>ADfzIyaH .</s>)
; -----------------------------------------------
; Case 003: L=781, k1=tfvf, #changes=2
; tfvf -> tevf
old: <L>781<pc>0700<k1>tfvf<k2>tfvf
new: <L>781<pc>0700<k1>tevf<k2>tevf
old: (781) {@<s>“tfvf devane”</s>@}¦ (I<s>-BvAdiH</s>-499. <s>aka</s>. <s>sew</s>. <s>Atma</s>.)
new: (781) {@<s>“tevf devane”</s>@}¦ (I<s>-BvAdiH</s>-499. <s>aka</s>. <s>sew</s>. <s>Atma</s>.)
; -----------------------------------------------
; Case 004: L=890, k1=dva, #changes=2
; dva -> dvf.   print change.
old: <L>890<pc>0786<k1>dva<k2>dva
new: <L>890<pc>0786<k1>dvf<k2>dvf
old: (890) {@<s>“dva saMvaraRe”</s>@}¦ (I<s>-BvAdiH</s>-934. <s>saka</s>. <s>ani</s>. <s>para</s>.)
new: (890) {@<s>“dvf saMvaraRe”</s>@}¦ (I<s>-BvAdiH</s>-934. <s>saka</s>. <s>ani</s>. <s>para</s>.)
; -----------------------------------------------
; Case 005: L=1753, k1=SO, #changes=2
; SO -> SE
old: <L>1753<pc>1316<k1>SO<k2>SO
new: <L>1753<pc>1316<k1>SE<k2>SE
old: (1754) {@<s>“SO pAke”</s>@}¦ (I<s>-BvAdiH</s>-918. <s>saka</s>. <s>ani</s>. <s>para</s>.)
new: (1754) {@<s>“SE pAke”</s>@}¦ (I<s>-BvAdiH</s>-918. <s>saka</s>. <s>ani</s>. <s>para</s>.)
; -----------------------------------------------
; Case 006: L=1802, k1=Svi, #changes=1
; ao -> o
old: (1803) {@<s>“[wu ao] Svi gativfdDyoH”</s>@}¦
new: (1803) {@<s>“[wu o] Svi gativfdDyoH”</s>@}¦
; -----------------------------------------------
gasyoun commented 4 years ago

One is a print change -- poor quality print. Noted in krm_printchange file of csl-corrections.

I wonder how many tens of poor quality print errors run unnoticed. And even reading the dictionary by a human can not help, only similar word comparison can.

funderburkjim commented 4 years ago

only similar word comparison can.

Agree. Many subtle errors uncovered during a process of comparing contextually similar words from different sources.

funderburkjim commented 4 years ago

5 more changes

Only the 5th is classed as a print change.

; Case 001: L=271, k1=ANaH, #changes=2, #extra_withs=1
; ANaH krandassAtatye -> ANaH krandaH sAtatye
; (separate premarker from root, and root from sense.
old: <L>271<pc>0277<k1>ANaH<k2>ANaH
new: <L>271<pc>0277<k1>kranda<k2>kranda
old: (271) {@<s>“ANaH krandassAtatye”</s>@}¦ (X<s>-curAdiH</s>-1728. <s>aka</s>. <s>sew</s>. <s>uBa</s>.)
new: (271) {@<s>“ANaH krandaH sAtatye”</s>@}¦ (X<s>-curAdiH</s>-1728. <s>aka</s>. <s>sew</s>. <s>uBa</s>.)
; -----------------------------------------------
; Case 002: L=274, k1=qukrIY, #changes=2
; qukrIY -> qu krIY    
; separate premarker from root
old: <L>274<pc>0283<k1>qukrIY<k2>qukrIY
new: <L>274<pc>0283<k1>krIY<k2>krIY
old: (274) {@<s>“qukrIY dravyavinimaye”</s>@}¦ (IX<s>-kryAdiH</s>-1473. <s>saka</s>. <s>ani</s>. <s>uBa</s>.)
new: (274) {@<s>“qu krIY dravyavinimaye”</s>@}¦ (IX<s>-kryAdiH</s>-1473. <s>saka</s>. <s>ani</s>. <s>uBa</s>.)
; -----------------------------------------------
; Case 003: L=416, k1=guvIM, #changes=2
; guvIM -> gurvI   (r sign mistaken as anusvara)
old: <L>416<pc>0405<k1>guvIM<k2>guvIM
new: <L>416<pc>0405<k1>gurvI<k2>gurvI
old: (416) {@<s>“guvIM udyamane”</s>@}¦ (I<s>-BvAdiH</s>-574. <s>aka</s>. <s>sew</s>. <s>para</s>.)
new: (416) {@<s>“gurvI udyamane”</s>@}¦ (I<s>-BvAdiH</s>-574. <s>aka</s>. <s>sew</s>. <s>para</s>.)
; -----------------------------------------------
; Case 004: L=1325, k1=mrewwa, #changes=2
; mrewwa -> mrewf  (wa after w should be vowel 'f'
old: <L>1325<pc>1061<k1>mrewwa<k2>mrewwa
new: <L>1325<pc>1061<k1>mrewf<k2>mrewf
old: (1325) {@<s>“mrewwa unmAde”</s>@}¦ (I<s>-BvAdiH-aka</s>. <s>sew</s>. <s>para</s>.)
new: (1325) {@<s>“mrewf unmAde”</s>@}¦ (I<s>-BvAdiH-aka</s>. <s>sew</s>. <s>para</s>.)
; -----------------------------------------------
; Case 005: L=1594, k1=viCa, #changes=2
; viCa -> vicCa    
; Print change
old: <L>1594<pc>1226<k1>viCa<k2>viCa
new: <L>1594<pc>1226<k1>vicCa<k2>vicCa
old: (1595) {@<s>“viCa gatO”</s>@}¦ (VI<s>-tudAdiH</s>-1423. <s>saka</s>. <s>sew</s>. <s>para</s>.)
new: (1595) {@<s>“vicCa gatO”</s>@}¦ (VI<s>-tudAdiH</s>-1423. <s>saka</s>. <s>sew</s>. <s>para</s>.)
; -----------------------------------------------
funderburkjim commented 4 years ago

Reasons for print change above:

gasyoun commented 4 years ago

would break alphabetical order

Do we have a script that checks it? We do, right?

funderburkjim commented 4 years ago

cheat sheet on Sanskrit sorting:

Notes on Sanskrit sorting.
Assume words to be sorted are in SLP1 transliteration.
This logic is aimed at Python3 code.

slp_from = "aAiIuUfFxXeEoOMHkKgGNcCjJYwWqQRtTdDnpPbBmyrlvSzsh"
slp_to =   "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvw"
slp_from_to = str.maketrans(slp_from,slp_to)

We can tell if string x is less than string y:
 x.translate(slp_from_to) < y.translate(slp_from_to)

If 'a' is a list of Sanskrit words, we can sort by:
 sort(a,key = lambda x: x.translate(slp_from_to))

krm alphabetical order check

The alphabetical ordering check for krm was done in in function check_alpha.

The headwords in krm were found, by this check, to be mostly in alphabetical order, with the following exceptions:

order error: 54, [0052], asu >aMsa
order error: 105, [0102], uBa >ubja
order error: 130, [0132], fzI >f
order error: 258.1, [0262], kFY >kF
order error: 330, [0342], Kaca >KakKa
order error: 659, [0612], nawa >Rada
order error: 859, [0762], dfmPa >dfpa
order error: 1031, [0875], puzpa >puMsa
order error: 1614, [1242], vizka >vizx
order error: 1685, [1288], Samu >Sama
order error: 1698, [1292], Sasu >SaMsu
order error: 1789, [1328], Slokf >SoRf
order error: 1814, [1337], zadx >zada
order error: 1899.1, [1372], saSca >samI
order error: 1975, [1400], sraki >syala
order error: 1977, [1401], sranBu >sraMsu
order error: 1985, [1403], svarta >svada

The first exception order error: 54, [0052], asu >aMsa means that at L=54, at page 52 of the scanned images, is found entry 'aMsa'; and the preceding entry is 'asu'. Since, according to the sanskrit lexicographical ordering, 'aMsa' precedes alphabetically 'asu', this is viewed as an ordering error.

gasyoun commented 4 years ago

mostly in alphabetical order, with the following exceptions:

Thanks for explaining. So these 17 cases were fixed or left as is?

sanskrit lexicographical ordering

You are aware that there is no one ordering, but at least two, with several smaller variations? Like some put the words with anusvara BEFORE, some AFTER where they belong. So @drdhaval2785 developed in 2014 one approach, but I do not see it documented in your simplified approach above.

funderburkjim commented 4 years ago

KRM not modified. Am aware of the variations of ordering regarding anusvara. My simplified approach is the only one I use.