Corrections in digitisation: Andhrabharati

Andhrabharati commented 3 years ago

Mismatched '[' and ']' cases

Opening '[' cases: 9 no.s Line 42424: <>vfttiH .. [kAlaH . iti hemacandraH .. The text after '[' belongs to the next entry (next line in print).

Line 54248: <>rAjanirGaRwaH .. [hemacandraH .. The text after '[' belongs to the next entry (next line in print).

Line 63129: <>pAM 4 . 1 . 41 .) pippalI . SroRideSaH . [iti The '[' is to be deleted here.

Line 92578: <>nAqIvraRaM vraRaM duzwamupadaMSaM vicarccikAm . [RAn . The text after '[' belongs to the next line.

Line 106031: kzIRAzwakarmmA¦, [n) puM, (kzIRAni azwakarmmARi Here the '[n)' is to be corrected as '[n]'.

Line 306359: bahvASI¦, [n) tri, (bahu aSnAtIti . bahu + Here the '[n)' is to be corrected as '[n]'.

Line 507383: SrI¦, Ya ga pAke . iti kavikalpadrumaH .. [kryA0- '[kryA0-' to be corrected as '(kryA0-'.

Line 507415: <>haraRam . “SrIste . [sAstAm ..”) The '[' is to be deleted here.

Line 507429: <>mAlaSroH . [tasyAH sampUrRajAtiH . asyAH The '[' is to be deleted here. ----------------- Closing ']' cases: 5 no.s Line 105215: kzarI¦, (n] (kzaraH kzaraRaM varzaRaM astvasmin kAle Here the '(n]' is to be corrected as '[n]'.

Line 117098: <>13 . 17 . 79 .skd2-298-b+ 52] .skd2-298-b+ 52] to be corrected as . [Page2-298-b+ 52].

Line 306359: bahvASI¦, [n) tri, (bahu aSnAtIti . bahu + Here the '(n]' is to be corrected as '[n]'.

Line 508964: <>DaraH .. (yaTA, mAGe . 7 . 62 .] The ']' is to be deleted here.

Line 578433: hUravaH¦, puM, (hU iti ravo'sya .] SfgAlaH . iti Here the ']' is to be corrected as ')'.

Andhrabharati commented 3 years ago

As I had converted the SKD file to Devanagari and glanced through, found some errors in the word endings (given as alt. forms of HW entries in the book)

The list is as under- [ca] [च]: should be [c] [च्] Typing error. [da] [द]: should be [d] [द्] Typing error. [ja] [ज]: should be [j] [ज्] Typing error. [kza] [क्ष]: should be [kz] [क्ष्] Typing error. [na] [न]: should be [n] [न्] Typing error. [nca] [न्च]: should be [nc] [न्च्] Typing error. [sa] [स]: should be [s] [स्] Typing error. [za] [ष]: should be [z] [ष्] Typing error. [zwu] [ष्टु]: should be [zwf] [ष्टृ] Typing error. [E] [ऐ]: should be [rE] [रै] Print error; or could be left as [ऐ]कारान्तः at this particular single syllabic word, just as [ऋ]कारान्तः etc. [Yca] [ञ्च]: should be [Yc] [ञ्च्] Typing error.

Some of these are just one or two occurrences, and some run into few hundreds.

As in any other dictionary, this SKD is also having multiple (grouped) words as HWs. Necessary action may be taken to "do" them!!

gasyoun commented 3 years ago

some errors in the word endings

@funderburkjim sounds like a batch correction for our master.

Andhrabharati commented 1 year ago

I have resumed looking into SKD, with the latest file (now using Jim's transcoding).

Corrected the file for the points (in 2 posts above ).

Found that 4 metalines needed correction either in k1 or k2 fields.

There are 354 entries where ka & k2 are not matching-- 345 cases of (variants) [with braces] 5 cases having avagraha 2 cases with ending ';' in k2 to be removed [L-17250 & L-18724] 1 case with ending ':' in k2 to be removed [L-3648] 1 case with ending 'M' in k1 to removed [L-41771] [though the print has M, as an error!]

Andhrabharati commented 1 year ago

There are 11 cases of -1[abc]+, while 21 cases of -[abc]1+ are present within [Page-breaks].

So, changed all the 11 'minority' cases as in the 21 'majority' cases.

Andhrabharati commented 1 year ago

There are ~50 CDSL split L-entries, which are meant as variant 'group's by SKD compilers, marked with a flower (curly) bracket:

2369, 2370 3002, 3003 3111, 3112 3132, 3133 4923, 4924 9828, 9829 12295, 12296 20225, 20226 23263, 23264 24085, 24086 24842, 24843 24994, 24995 25964, 25965, 25966 25978, 25979 27328, 27329 27868, 27869 30138, 30139, 30140 30143, 30144 30384, 30385 30444, 30445 30979, 30980 31340, 31341 31525, 31526 32293, 32294 32402, 32403 32688, 32689 32992, 32993 33015, 33016 33372, 33373 33726, 33727 33735, 33736 34853, 34854 34874, 34875 35036, 35037 35051, 35052, 35053 35124, 35125 35242, 35243 33397, 33398 36198, 36199 36449, 36450 36502, 36503 37431, 37432 39865, 39866 39989, 39990 40008, 40009 40607, 40608 40977, 40978 41130, 41131 41708, 41709 41933, 41934 42086, 42087 42175, 42176 42177, 42178

These could be appropriately merged as single entries and kept as comma separated items in k2-field, as in some 'recent' works.

[And, there are quite many more possible in this list-- due to some systematic differences (like with/without a terminating comma etc.)!!]

Andhrabharati commented 12 months ago

There is one bad scan from Thomas [3-021], which has the bottom portion cut and 'mysteriously' overlapped by the top portion of another page [3-023].

And here is a better page from elsewhere--

Andhrabharati commented 12 months ago

Navigated through all the scan pages of SKD and noted the following:

Tables : 18 1-049 to 050 (continued across 2 pages) 1-076 1-114 to 119 (continued across 6 pages) 2-048 to 052 (continued across 5 pages) 2-268 [2 nos.] 2-269 to 270 (continued across 2 pages) 2-270 2-271 2-377 2-467 to 486 (continued across 20 pages) 2-831 to 832 (continued across 2 pages) 2-930 to 932 (continued across 3 pages) 3-321 to 333 (continued across 13 pages) 3-333 to 364 (continued across 32 pages) 4-200 4-200 to 201 (continued across 2 pages) 5-093 to 094 (continued across 2 pages)

Tables better rendered as pictures: 5 3-532 3-617 [2 nos.] 3-618 [2 nos.] 5-306 [2 nos.]

Pictures: 99 2-212 [5 nos.] 2-251 to 2-258 [67 nos.] 2-281 [4 nos.] 2-282 2-413 2-447 2-491 [3 nos.] 2-493 3-022 3-041 3-379 3-618 [2 nos.] 4-157 [2 nos.] 4-158 [5 nos.] 4-214 5-262 5-264 5-304

Andhrabharati commented 12 months ago

In comparison with the print, the CDSL file has

many tables marked as columns (with Cx notation), but have errors; and some of the tables are rendered as just running text-- thus losing proper understanding/visibility.
just 5 <Picture> markers, at 3-618, 4-214, 5-262, 5-304 (2 nos.); thus losing the rest of the pictorial data!

Andhrabharati commented 12 months ago

Did a parsing of the data and found some more corrections--

Addl. grouped entries 5619, 5620 14664, 14665 20356, 20357 20602, 20603 22237, 22238 23958, 23959 24328, 24329 24608, 24609 24841 (with 24842, 24843) 27937, 27939 34886, 34887 37301, 37302 40352, 40353

Revised HW(s) <L>715<pc>1-031-c<k1>अत्यन्तःसुकुमारः<k2>अत्यन्तःसुकुमारः <L>9552<pc>2-235-a<k1>क्षीब(व)<k2>क्षीब(व) <L>21065<pc>3-107-a<k1>पाण्डुरः<k2>पाण्डुरः <L>23264<pc>3-302-c<k1>प्रस्तीमः<k2>प्रस्तीमः <L>33474<pc>4-460-c<k1>विष्णुशृङ्खलः<k2>विष्णुशृङ्खलः <L>39154<pc>5-353-c<k1>सिन्धुवारः<k2>सिन्धुवारः <L>41578<pc>5-525-c<k1>हवङ्गः<k2>हवङ्गः

Deleted HW(s) <L>36132<pc>5-126-a<k1>शूकडी<k2>शूकडी ;; to merge the data with previous entry

And this parsing facilitates better grasping of the structure of the work and yields many more "easy" corrections!

Andhrabharati commented 12 months ago

By error, posted 2 posts in a 'wrong' issue-- https://github.com/sanskrit-lexicon/SKD/issues/14#issuecomment-1793766137

https://github.com/sanskrit-lexicon/SKD/issues/14#issuecomment-1793767374

Andhrabharati commented 12 months ago

Noticed, by chance, two entry words that were merged into the previous entry in the CDSL text.

This prompts me to look for all such entries, to be the next point in my SKD work.

Found 39 such HWs, with a plain pattern search. Probably a few more might be lying 'hidden' still.

funderburkjim commented 12 months ago

@Andhrabharati I like your comprehensive notes above. They will be helpful when revising the cdsl version. Hope you will continue these notes as your skd examination continues.

@drdhaval2785 hope you will pay attention to AB's notes (You use SKD quite a bit, right?)

funderburkjim commented 12 months ago

3-021

@Andhrabharati : Have replaced pg3_021.pdf at Cologne, per img file above.

Also replaced at https://github.com/sanskrit-lexicon-scans/SKD

Andhrabharati commented 12 months ago

Probably a few more might be lying 'hidden' still.

Now the list has another 8 added, this time 3 dhatus as well, of which one is a 'real' hiding entity.

While these are all at the beginning of a new line in the print, the 'hidden' dhatu is merged with the previous entry as a running matter!

@gasyoun , probably this is an interesting point for you!!

sanskrit-lexicon / SKD

Corrections in digitisation: Andhrabharati #13

Mismatched '[' and ']' cases