funderburkjim commented 3 years ago

70 discusses the possibility of using Schmidt dictionary to generate the VN sections of PWK; these VN (additions/corrections) sections are not part of the PWK digitization so far.

We are trying to decide the feasibility of this approach; the alternative is to prepare an independent digitization of the VN pages of PWK using the typists of @thomasincambodia .

In preparation for this feasability study, numerous changes were made to the sch.txt digitization. This issue devoted to documenting the changes to sch.txt.

funderburkjim commented 3 years ago

The code and data for the revision to sch.txt are in this repository at vn-sch/step1. The step1 readme file describes the technical steps used to generate a new sch.txt, and the redo.sh script does the computation.

This note attempts a more comprehensible description of the changes. The changes may be considered of two types:

IAST changes. Schmidt represents Sanskrit in italicized IAST; while PW represents Sanskrit in slp1 (Devanagari in print). Thus, conversion among iast, slp1, and Devanagari is important if we are to convert Schmidt entries into PW entries.
cosmetic changes. Some aspects of the sch digitization can be changed so that the digitization more closely resembles the printed text of Schmidt.

The changes were made in a sequence of 7 steps, starting with a copy temp_sch.txt of the current digitization and ending with a revised version temp_sch7.txt. (These versions are not tracked by git, but the readme.txt describes how temp_sch.txt can be retrieved from csl-orig, and then the redo.sh will recreate temp_sch7.txt).

change1.txt. A handful of iast spelling changes.
change2.txt. This reformats the homonyms to conform with the printed text. The old homonym format is like {!2!}. These homonym numbers are primarily (exclusively?) the homonym numbers from PWK.
- Example {%a°%}¦ {!2!} -> 2. {%a°%}¦
change3.py various picky changes
- '.%}' -> '%}.' Italic text is IAST Sanskrit. Similarly for comma, semicolon
- Remove space before ','
- Remove multiple spaces
- change 'º' (MASCULINE ORDINAL INDICATOR) to '°' (DEGREE SIGN)
- change '°=' to '° ='
change4.py Insert a 'type' character at front of entry. This was previously entered as a 'meta' symbol in each schmidt entry.
- example: {%aṃśumatphala%}¦ [m.] ... {part=,seq=13,type=°,n=3} -> °{%aṃśumatphala%}¦ ...
change5.py Remove {#X#} at beginning of each entry, as this is not part of the printed text. X is a copy of 'k1' of meta line for entry.
- example: {#a#} 2. {%a°%}¦ ... -> 2. {%a°%}¦ ...
change6.txt. More changes in iast, primarily changes to 'k2' of metaline. These required before change7. Also, recoded about 80 instances with a period embedded in italicized iast, since a period represents danda in slp1.
change7.py In each metaline, convert k2 from iast to slp1. It is convention that both k1 and k2 are in slp1. But the starting version of sch had k2 in iast coding.

funderburkjim commented 3 years ago

While writing these comments, I noticed the suggestions of this comment. I'll take a look at them tomorrow and make changes as needed.

funderburkjim commented 3 years ago

Two examples of before/after

<L>5<pc>001-1<k1>aMSaprAsa<k2>aṃśaprāsá
{#aMSaprAsa#} {%aṃśaprāsá%}¦  m. {%Aṃśa%}'s Wurf , Maitr. S. 1 , 6 , 12 (105 , 1). {part=,seq=6,type=,n=2}
<LEND>

<L>5<pc>001-1<k1>aMSaprAsa<k2>aMSaprAsa/
{%aṃśaprāsá%}¦ m. {%Aṃśa%}'s Wurf, Maitr. S. 1, 6, 12 (105, 1). {part=,seq=6,type=,n=2}
<LEND>

<L>12<pc>001-1<k1>aMSumatPala<k2>aṃśumatphala
{#aMSumatPala#} {%aṃśumatphala%}¦  [m.] Musa sapientum , S I , 159 , 11 v.u. (Ko.; {%aśu°%} gedruckt); 538 , 11 v.u. (Ko.). {part=,seq=13,type=º,n=3}
<LEND>

<L>12<pc>001-1<k1>aMSumatPala<k2>aMSumatPala
°{%aṃśumatphala%}¦ [m.] Musa sapientum, S I, 159, 11 v.u. (Ko.; {%aśu°%} gedruckt); 538, 11 v.u. (Ko.). {part=,seq=13,type=°,n=3}
<LEND>

A minor revision to the display code for sch (make_xml.py) was made. Here is an example of the difference.

BEFORE:

AFTER:

funderburkjim commented 3 years ago

And here is the printed text for aMSumatphala:

funderburkjim commented 3 years ago

+ to †

The suggested mentioned in this comment have been implemented. 4 entries (from the Nachtrag) were modified. The change transactions put at end of change6.txt.

Andhrabharati commented 3 years ago

change 'º' (MASCULINE ORDINAL INDICATOR) to '°' (DEGREE SIGN)

This needs to be done in all the works; some 3-4 works have this in bulk (10k+) and some have this in small numbers.

funderburkjim commented 3 years ago

Use sup tag for superscripts.

In sch.txt before this change, a superscript was indicated as blah^a. Displays had code for 'sch' to change this to blah^<sup>a</sup>. Also, 9 digitizations xxx.txt already use the sup-tag markup xmltags.

So, now the sup markup is used in sch.txt.

Andhrabharati commented 3 years ago

Also, 9 digitizations xxx.txt already use the sup-tag markup xmltags.

So, now the sup markup is used in sch.txt.

OK, got it; so it's process to be done still, to be consistent across all the works. [ACC: 8300; and STC: 600; and few others having ^ in small numbers]

gasyoun commented 3 years ago

SUP for superscipt, understood.

Schmidt represents Sanskrit in italicized IAST

And uses capital letters for names.

since a period represents danda in slp1

You seem to remember just everything

Andhrabharati commented 4 months ago

Now that all the VN pages of pwk volumes got digitized (courtesy Jim & Thomas), various issues on pwkvn and SCH could be closed now.

What do you say, @funderburkjim ?

sanskrit-lexicon / PWK

PWK-VN Schmidt preparation #74

70 discusses the possibility of using Schmidt dictionary to generate the VN sections of PWK; these VN (additions/corrections) sections are not part of the PWK digitization so far.

Two examples of before/after

+ to †

Use sup tag for superscripts.