sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

\xa4 in ap90hw0.txt #308

Closed drdhaval2785 closed 7 years ago

drdhaval2785 commented 7 years ago

While running the code, program stopped on some non-ASCII entry.

Line 10908: 0402-c:kiwwaM, --ki¤wwakaM:68656,68658
    Line 11045: 0407-c:ku¤wwa:69494,69496
    Line 11046: 0407-c:ku¤wwakaH:69497,69497
    Line 11047: 0407-c:ku¤wwanaM:69498,69499
    Line 11048: 0407-c:ku¤wwa(¤wwi)nI:69500,69501
    Line 11048: 0407-c:ku¤wwa(¤wwi)nI:69500,69501
    Line 11049: 0407-c:ku¤wwAka:69502,69505
    Line 11050: 0407-c:ku¤wwita:69506,69507
    Line 12041: 0441-c:Ka¤wwASaH --SI:75146,75146
    Line 12042: 0441-c:Ka¤wwiH:75147,75147
    Line 12043: 0441-c:Ka¤wwikaH:75148,75150
    Line 17500: 0652-a:Ga¤wwakuwIpraBAtanyAyaH:110029,110043
    Line 17630: 0659-b:pa¤wwakaH:111266,111268
    Line 17631: 0659-b:pa¤wwanaM --nI:111269,111269
    Line 17632: 0659-b:pa¤wwikA:111270,111276
    Line 21944: 0807-c:Ba¤wwAra:136135,136140
    Line 21947: 0807-c:Ba¤wwinI:136150,136154
    Line 31000: 1154-a:sPi¤ww:194340,194341
funderburkjim commented 7 years ago
  1. Agree that these look like spurious characters within the original digitization. Based on examination of the scan for kuwwa, I see nothing that would warrant a special character between 'u' and 'w'. I'll generate corrections for these in ap90.txt.

image

Here is the digitzation for kuwwa, from ap90_orig_utf8.txt:

<P>.{#ku¤TTa#}¦ {%a.%} (At the end of comp.) Di-
<>viding, cutting; grinding. {#{@--¤TTaH@}#} (in
<>Math.) A multiplier.

Note the presence of that special character in the 2nd line also.

  1. It may (or may not) be of interest or use to you to see the ap90hw1_note.txt file within pywork directory. Here is a sample of comments generated in the neighborhood of kuwwa: 0406-c: 'kujJawiH, kujJawikA, kujJawI' => 'kujJawiH' :69334,69335 0406-c: 'kuMjaH, --jaM' => 'kuMjaH' :69348,69358 0407-a: 'kuwika --ta' => 'kuwika' :69384,69384 0407-a: 'kuwaH, --waM' => 'kuwaH' :69385,69393 0407-b: 'kuwIraH, --raM, kuwIrakaH' => 'kuwIraH' :69448,69450 0407-b: 'kuwuMbaM, kuwuMbakaM' => 'kuwuMbaM' :69458,69473 0407-c: 'kuwuMbikaH, kuwuMbin' => 'kuwuMbikaH' :69474,69489 0407-c: 'ku¤wwa' => 'kuwwa' :69494,69496 0407-c: 'ku¤wwakaH' => 'kuwwakaH' :69497,69497 0407-c: 'ku¤wwanaM' => 'kuwwanaM' :69498,69499
funderburkjim commented 7 years ago

There are all in all 70 lines of the ap90.txt digitization where that ¤ character appears.

I looked at the scan for a couple of the others, and in those cases also that character appears to be a typo.

Thus, I think it is safe to consider all occurrences of that character to be typos, and will remove all of them from the digitization.

gasyoun commented 7 years ago

kuwwa, I see nothing that would warrant a special character between 'u' and 'w'. so do I. Thus, I think it is safe to consider all occurrences of that character to be typos, and will remove all of them from the digitization. seems fine, no meta data involved.

funderburkjim commented 7 years ago

corrections installed.