sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

KRM devanagari #351

Open gasyoun opened 7 years ago

gasyoun commented 7 years ago

Looks fishy to me कबृ:

[Footnote: 2. कबरः कर्बुरः, इतीमौ औणादिकप्रत्ययान्तौ ।] 


gasyoun commented 6 years ago

@funderburkjim where does those broken Devanagari's come from?

funderburkjim commented 6 years ago

That fishy character is the way a naked visarga gets rendered in Devanagari.

This, in turn, is caused by a digitization error:

That `H-` is the culprit.
4410 old <F>2. <s>kabaraH karburaH, itImO ORAdikapratyayAntO .</s></F><s> H-</s>
4410 new <F>2. <s>kabaraH karburaH, itImO ORAdikapratyayAntO .</s></F><s>

There are 4 other probable similar errors:

    843:<><s>iti niziDyate . ataH ‘mitAM hasvaH’ (</s>6<s>-</s>4<s>-</s>32<s>) iti hrasvo na .</s></F><s> H-mikA, amimizakaH-zikA;</s>
   4454:<><s>paRAyyarUpAH panitAkftIn yayurBAminya evAkzamayA svakAmukAn ..’ DA</s>.<s> kA</s>.<s> </s>1<s>-</s>57.</F><s> H-kAmukI-kAmukA, kAmaH, cikAmayizuH, cikamizuH, caNkAmaH-caNkamaH;</s>
  12867:<><s>boDyam .</s></F><s> H-cicIzakaH-zikA, cecIyaka</s>
  12869:<><s>sarvatra jYeyam .</s></F><s> H-yikA; </s>[Page0502+ 25]

In examining the one at 843, under hw 'ama', I find the coding quite confusing. There is a footnote 1A in the text. In the digitization, the entire footnote appears between two letters of the body of the text, the second of which is a visarga.

I've not looked at the digitization of this so-called dictionary for a LONG time. And this current look makes me think it has lots of special needs before it is useful. An expert needs to improve this, not me.

gasyoun commented 6 years ago

An expert needs to improve this, not me.

@SergeA what's your take on it?

SergeA commented 6 years ago

@SergeA what's your take on it?

My take on what? I am not an expert in dictionary mark-ups. And even do not know where can I look at this code.

funderburkjim commented 6 years ago

where can I look at this code.

If you wanted to look at the digitization for KRM, one place to start would be the package in the krm downloads.

The krm_orig_utf8.txt file is the original coding from Thomas, with HK for devanagari.

The current version is krm.txt, with SLP1 coding.

For this dictionary, since we have done very little work with KRM, essentially the only difference between the two is the transcoding choice.

I'm not suggesting you examine this, but if you have an interest in doing so, comparing one of these versions to the scanned images might be the place to start.

Because this is not really a dictionary, but rather a reference work with tabular data, to make it useful probably requires a different kind of markup, which would need to be developed to overlay the minimalist markup that is currently present. Before such extended markup is developed, the underlying structure of the author's intent needs to be understood. That understanding is what I was thinking an interested expert would supply.

gasyoun commented 6 years ago

reference work with tabular data, to make it useful probably requires a different kind of markup

Exactly, as it is I would say it's a soup and mess.

Before such extended markup is developed, the underlying structure of the author's intent needs to be understood. That understanding is what I was thinking an interested expert would supply.

@drdhaval2785 ever worked with it?