sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

PWKVN3 vs Schmidt #77

Open funderburkjim opened 2 years ago

funderburkjim commented 2 years ago

This issue reports results of a comparison of the 10 pages of digitization of corrections/additions of volume 3 of PWK to the Schmidt digitization.

We start with

Conceptually, the idea of comparison is simple. For each of the 858 pwk correction records, use the headword (and homonym number if present) to find the corresponding record(s) from sch.txt. Then, show the two so that a detailed comparison can readily be made by eye.

The obstacles encountered, how the obstacles were handled, and the resulting comparison files will be described in the comments below.

gasyoun commented 2 years ago

For each of the 858 pwk correction records, use the headword (and homonym number if present) to find the corresponding record(s) from sch.txt.

Will we be able to make a list of the wrong, corrected words as well?

Like:

scram read as scream dhamma read as dharma

funderburkjim commented 2 years ago

The comparison work is in vn-sch/step3 subdirectory. Considerable detail is found in the readme.txt, and a summary of steps in redo.sh script.

The final comparison file is

This comparison has a section for each of the 858 entries of the digitization of pwk-vn3. The first comparison (from the devanagari version) is

(pvn      1) 1. {#आभोग#} (auch Nachtr. 1 u. 2) etwa {%Versuch%} Comm. [Page256-a] ·zu JOGAS. 1, 17. 
(sch   7120) 1. {#आभोग#} 8. Harṣac. 185, 21. {#°गाख्य#} 182, 10. — Etwa: Versuch, Komm. zu Yogas. 1, 17. — °Schlangenhaube, S I, 219, 4. = {#°आकृति#}, Yudh. 4, 53 (in {#तुहिनाभोग: तुहिनमयो हिममय आभोग आकृतिर् यस्य#}). 

There are a few 'meta' markups in the comparison:

funderburkjim commented 2 years ago

Assessment of comparison

The motivating purpose of this exercise is to help us decide whether the undigitized VN sections of PWK can be simulated based upon the Schmidt dictionary.

Examination of the comparison file leads me to believe that a good first approximation of PWK-VN could be derived from Schmidt.

Is such a first approximation good enough? Or should we request @thomasincambodia to provide a digitization of the rest of the PWK-VN material?

Hope others will also browse the comparison file and form an opinion on which way we should proceed.

funderburkjim commented 2 years ago

obstacles to comparison

In the first run of the comparison, there were about 160 of the PVN entries which were not matched with Schmidt; these were reduced to the current 2 items by correcting many errors in the PWK-VN digitization and a few errors in the Schmidt digitization.

PVK-VN changes

change1.txt documents the changes made to the original digitization pwk3vn_utf8.txt.

Many of the changes are unrelated to HK spelling errors, but beginning at line 1151, such HK spelling changes are detailed. Unfortunately, there were a lot of these, due no doubt in part to the quality of the scanned image and also to some of the peculiarities of the Devanagari font used in PWK. For instance there were many instances where a Devanagari 'j' was miscoded as 'jJ' (HK = slp1 'jY'). Indeed the visual difference between the two is slight in the Devanagari of PWK: image Another systematic error was mistaking the Devanagari aspirated 'j' for 'kt': image

Two of the changes to pwk were considered print changes

funderburkjim commented 2 years ago

corrections to sch.

These changes may be seen in the csl-orig commit.

A few were simple one-letter changes and a few were marked as print changes (often determined by comparison with mw) .

In addition there were several cases where entries of sch needed to be split. See changes_sch.txt. How many other such errors are lurking?

I also noticed several instances where the ordering of entries in sch.txt differs from the ordering in the printed page; I did not attend to changing these. For example 'kzveqa' precedes 'kzveqita' in the print (as it should, by alphabetical ordering), but in sch.txt it follows, witness this snip (the page number of kzveqa is also wrong): image

image

Maybe there is some systematic way to identify these are other similar errors in sch.txt?

maltenth commented 2 years ago

@funderburkjim

the word is printed kSveda, not kSveDa and therefore in sch.pdf in the wrong position or, if D is correct then it is misprinted d in sch

maltenth commented 2 years ago

kSveda is a misprint for ksveDa

MW: kSveDa ... (ā), f. the roaring of a lion, a war-whoop, a battle-cry;

gasyoun commented 2 years ago

The ls refererences are in 'AS' form, and also still have some of Boehtlingk peculiarities (such as JOGAS. instead of modern iast YOGAS).

And so should remain?

miscoded as 'jJ' (HK = slp1 'jY'). Indeed the visual difference between the two is slight in the Devanagari of PWK

Agree, rather similar. I've made a replica of the font in 2005 and I call it Schlegel's font or the French font.

Schlegel-Varnamala-A4

Two of the changes to pwk were considered print changes

So minus 2 words in our general word index, hurray.

I also noticed several instances where the ordering of entries in sch.txt differs from the ordering in the printed page;

guess @thomasincambodia must have an opinion on that. Other than the error noticed in kSveda is a misprint for ksveDa case

funderburkjim commented 2 years ago

@gasyoun Like the French Font! Looks like good replica of font that Boehtlingk used.

funderburkjim commented 2 years ago

@thomasincambodia

Have tried to reach you to discuss PWK-VN digitization.

Andhrabharati commented 2 years ago

I had extracted the VN data from SCH (together with Schmidt's 'updates' to them) in December. (Basically separating out Schmidt's new entries, and splitting the rest as pwk volume-wise strings.) Took me about 12 days time to do the exercise.

Just waiting for Thomas's full digization of pwk VN pages, to compare with.

maltenth commented 2 years ago

@thomasincambodia

Have tried to reach you to discuss PWK-VN digitization.

@funderburkjim please try again

Andhrabharati commented 2 years ago

@funderburkjim / @thomasincambodia,

I got very good quality scans of pwk volumes (2-7) now.

Guess these would be helpful for digitization (if not completed yet) of VN pages, or for a second reading, or even to replace the Cologne scans.

Interested about this?

gasyoun commented 2 years ago

I got very good quality scans of pwk volumes (2-7) now.

Where are they?

funderburkjim commented 2 years ago

@Andhrabharati Not sure whether 'very good quality scans of pwk volumes (2-7)' should replace the current Cologne scans. What do you think?

Andhrabharati commented 2 years ago

On a second thought, probably no change is needed (as there seem to be no one that is aware of the existence of these and apparently the present cologne scans alone are being referred by the CDSL users).

Let the life go on, as is!!

Andhrabharati commented 1 week ago

Now that all the VN pages of pwk volumes got digitized (courtesy Jim & Thomas), various issues on pwkvn and SCH could be closed now.

What do you say, @funderburkjim ?