sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

PWK-VN Schmidt preparation #74

Open funderburkjim opened 2 years ago

funderburkjim commented 2 years ago

70 discusses the possibility of using Schmidt dictionary to generate the VN sections of PWK; these VN (additions/corrections) sections are not part of the PWK digitization so far.

We are trying to decide the feasibility of this approach; the alternative is to prepare an independent digitization of the VN pages of PWK using the typists of @thomasincambodia .

In preparation for this feasability study, numerous changes were made to the sch.txt digitization. This issue devoted to documenting the changes to sch.txt.

funderburkjim commented 2 years ago

The code and data for the revision to sch.txt are in this repository at vn-sch/step1. The step1 readme file describes the technical steps used to generate a new sch.txt, and the redo.sh script does the computation.

This note attempts a more comprehensible description of the changes. The changes may be considered of two types:

The changes were made in a sequence of 7 steps, starting with a copy temp_sch.txt of the current digitization and ending with a revised version temp_sch7.txt. (These versions are not tracked by git, but the readme.txt describes how temp_sch.txt can be retrieved from csl-orig, and then the redo.sh will recreate temp_sch7.txt).

funderburkjim commented 2 years ago

While writing these comments, I noticed the suggestions of this comment. I'll take a look at them tomorrow and make changes as needed.

funderburkjim commented 2 years ago

Two examples of before/after

<L>5<pc>001-1<k1>aMSaprAsa<k2>aṃśaprāsá
{#aMSaprAsa#} {%aṃśaprāsá%}¦  m. {%Aṃśa%}'s Wurf , Maitr. S. 1 , 6 , 12 (105 , 1). {part=,seq=6,type=,n=2}
<LEND>

<L>5<pc>001-1<k1>aMSaprAsa<k2>aMSaprAsa/
{%aṃśaprāsá%}¦ m. {%Aṃśa%}'s Wurf, Maitr. S. 1, 6, 12 (105, 1). {part=,seq=6,type=,n=2}
<LEND>

<L>12<pc>001-1<k1>aMSumatPala<k2>aṃśumatphala
{#aMSumatPala#} {%aṃśumatphala%}¦  [m.] Musa sapientum , S I , 159 , 11 v.u. (Ko.; {%aśu°%} gedruckt); 538 , 11 v.u. (Ko.). {part=,seq=13,type=º,n=3}
<LEND>

<L>12<pc>001-1<k1>aMSumatPala<k2>aMSumatPala
°{%aṃśumatphala%}¦ [m.] Musa sapientum, S I, 159, 11 v.u. (Ko.; {%aśu°%} gedruckt); 538, 11 v.u. (Ko.). {part=,seq=13,type=°,n=3}
<LEND>

A minor revision to the display code for sch (make_xml.py) was made. Here is an example of the difference.

BEFORE: image

AFTER: image

funderburkjim commented 2 years ago

And here is the printed text for aMSumatphala: image

funderburkjim commented 2 years ago

+ to †

The suggested mentioned in this comment have been implemented. 4 entries (from the Nachtrag) were modified. The change transactions put at end of change6.txt.

Andhrabharati commented 2 years ago

change 'º' (MASCULINE ORDINAL INDICATOR) to '°' (DEGREE SIGN)

This needs to be done in all the works; some 3-4 works have this in bulk (10k+) and some have this in small numbers.

funderburkjim commented 2 years ago

Use sup tag for superscripts.

In sch.txt before this change, a superscript was indicated as blah^a. Displays had code for 'sch' to change this to blah^<sup>a</sup>. Also, 9 digitizations xxx.txt already use the sup-tag markup xmltags.

So, now the sup markup is used in sch.txt.

Andhrabharati commented 2 years ago

Also, 9 digitizations xxx.txt already use the sup-tag markup xmltags.

So, now the sup markup is used in sch.txt.

OK, got it; so it's process to be done still, to be consistent across all the works. [ACC: 8300; and STC: 600; and few others having ^ in small numbers]

gasyoun commented 2 years ago

SUP for superscipt, understood.

Schmidt represents Sanskrit in italicized IAST

And uses capital letters for names.

since a period represents danda in slp1

You seem to remember just everything

Andhrabharati commented 1 week ago

Now that all the VN pages of pwk volumes got digitized (courtesy Jim & Thomas), various issues on pwkvn and SCH could be closed now.

What do you say, @funderburkjim ?