sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

PWK ls corrections #79

Closed funderburkjim closed 3 months ago

funderburkjim commented 2 years ago

This continues the harvesting of corrections from the version pw_AB_L0 mentioned in #72. There are many changes to the ls markup in pw_AB. The work described in this issue aims to identify the most material changes, and apply them to the current Cologne digitization pw.txt.

funderburkjim commented 2 years ago

first batch

It is possible to compare the sequence of ls items in each entry for the two versions of pw (pwAB and pw=pw.txt cologne), and then to make an estimate of when the ls-sequences for an entry are (materially) the same or different in the versions.

There are approximately 135000 entries (135787 by current count). About 52000 of these entries have ls markup. And there are about 78000 distinct ls instances.

Before implementing changes to pw.txt, there are about 3000 entries whose ls markup is identified as materially different in the two versions. It is anticipated that most of these 3000 entries will require corrections to pw.txt.

After implementing the first batch of changes, there remain about 2000 entries with differences.

The first batch of changes are various 'batch' changes, made to pw.txt via emacs. Here is a technical description of the sequence of changes.

 ',Sch.</ls>' -> '</ls>, <ab>Sch.</ab>'  (184)
 '.Sch.</ls>' -> '.</ls> <ab>Sch.</ab>'  (2)
 '<ls>Sch.</ls>' -> '<ab>Sch.</ab>'  (24)
    NOTE: Sch. now removed from pwbib_input.txt.
 ' </ls> ' -> '</ls> ' (4)
 ' </ls>' -> '</ls> ' (5)
 '<ls>VĀMANA.</ls>' -> '<ls>VĀMANA</ls>.' (8)
 '<ls>VĀMANA.' -> '<ls>VĀMANA '  
    next character is either a digit or S (2)
 '<ls>ebend.</ls>' -> '<ab>ebend.</ab>' (400) ibid.
 '<ls>ebend.' -> '<ab>ebend.</ab> <ls>' (47)
 '<ls>VP.².' -> 'VP.²'  (443)

The revised pw.txt is that of the commit 4222d80 mentioned above.

The next work will address the remaining 2000 differences.

Andhrabharati commented 2 years ago

Whatever use these might have, just posting the ls and ab lists from my pwk work. pwk_ls list (AB).txt pwk_ab list (AB).txt

If they are found suitable and marked in the Cologne pw file, I can next give the resolved lists.

funderburkjim commented 2 years ago

Will take a look at these when I finish current round of PWK-ls corrections. Then will examine your comparable listings for PWG.

funderburkjim commented 2 years ago

Processed about 600 of the 2000 items. Results now in pw.txt (see link to commit 3230d45 of csl-orig). Also some changes to tooltips (commit 7e60294 link above).

The changes to pw.txt (with some context) can be seen in changes_04.

Also made 130+ changes to a copy of pw_AB_L0. A summary of these changes is in diff_AB_02.

Still have about 1400 differences to examine further.

funderburkjim commented 2 years ago

This phase completed.

See the csl-orig commit 2ea711c above. About 1300 lines changed.

change_05.txt has the change transactions to pw.

As mentioned, I've also made related changes to the pw_AB_L0 file.
The revised version is pw_AB_03.zip In addition to the specific changes, pw_AB_03 differs in 2 ways from pw_AB_L0:

funderburkjim commented 2 years ago

What's next

The spacing of the 'ls' elements needs to be addressed. While this has been done in pw_AB version, it has not generally been done in pw.txt. For example, compare <ls>R.ed.Bomb.2,118,18.</ls> to image The spacing would be better as <ls>R. ed. Bomb. 2,118,18.</ls>

Such spacing improvement must be coordinated with changes in pwbib_input.txt for the tooltips to work properly. For example:

old
X024<TAB>R.ed.Bomb.<TAB>R.ed.Bomb.<TAB>R.ed.Bomb. = [unknown literary source]
new
X024<TAB>R.ed. Bomb.<TAB>R.ed. Bomb.<TAB>R.ed. Bomb. = [unknown literary source]

And also, as this example indicates, the Tooltips need revision and improvement. e.g. RĀMĀYAṆA, Bombay edition.

I plan to deal with these two improvements to pw.txt and pwbib_input.txt next.

But plan to work on the BOESP project before returning to pw.txt.

gasyoun commented 2 years ago

X024R.ed. Bomb.R.ed. Bomb.R.ed. Bomb. = [unknown literary source]

Is it not Parab's @Andhrabharati ? See https://books.google.ru/books?id=vDibDgAAQBAJ&pg=PA1563&lpg=PA1563&dq=ramayana+bombay&source=bl&ots=uIkU_E2sTq&sig=ACfU3U3mZxsoF8_STDJpstmHaHywme1y_A&hl=en&sa=X&ved=2ahUKEwiGv8as34D0AhVJs4sKHaNqAKQQ6AF6BAgUEAM#v=onepage&q=ramayana%20bombay&f=false

1888

Andhrabharati commented 2 years ago

I wish it is @gasyoun; but it is of much later time (1888) than PWG or pwk.

Keep guessing, as the reference to R. ed. Bomb. has first appeared in PWG Theil IV (1865).

And Nirnaya Sagar Printing Press was borne in 1867, and KP Parab entered the NS team much later!!

funderburkjim commented 2 years ago

Further work has been done in the last two weeks on the literary source names and reference numbers. This is reflected in the two commits to csl-orig and to csl-pywork mentioned above.

About 10% of the lines in pw.txt have been modified (57397 out of 682616 lines). The changes were done in steps (from 06 to 13). change_06.txt is the first batch of changes, and change_13.txt is the last batch. The changes are cumulative; that is, some lines are changed several times.

844 literary source name 'abbreviations' have been identified. (see pwbib_input.txt above). listls1_pw_summary.txt shows each of the literary source names and a count of instances. listls1_pw_detail.txt shows the individual instances.

funderburkjim commented 2 years ago

revision of AB version

@Andhrabharati version has continued to be of use, and I have made changes to the literary source parts of his version. Keeping his version 'in sync' with the current Cologne version will facilitate using his version in other tasks, such as abbreviation markup, general punctuation, and perhaps other innovations he has introduced.

The latest revision is pw_AB_08.zip With these revisions, a 'summary' version derived from pw_AB_08.txt is identical to listls1_pw_summary.txt.

funderburkjim commented 2 years ago

orphan references

Most of this work has focused on the 'named' literary sources (e.g. <ls>AGNI-P. 11,2. 3.</ls>, about 75000 of these). There are also about 6000 'orphan' or 'naked' literary source references e.g.

<L>21<pc>1001-2<k1>aMSu<k2>aMSu/<e>100
{#aMSu/#}¦ <lex>m.</lex>
<div n="1">— 1) {%<is>Soma</is>-Stengel%} (<ls>KĀTY. ŚR. 9,4,20</ls>) und {%<is>Soma</is>-Saft.%}
<div n="1">— 2) {%Strahl%} <ls>93,5. 102,13. 170,27.</ls>     <<< orphan - maybe KĀTY. ŚR.  ?
<div n="1">— 3) <ab>N.pr.</ab> eines Mannes.
<LEND>

Further attention will need to be given to these some day, when we provide links.

funderburkjim commented 2 years ago

'unknown literary sources'

In the list of 844 literary source abbreviations that are 'used' (e.g. pwbib_input.txt), 374 currently have the 'tooltip' to be '[unknown literary source]' . A next good step is to resolve these, and also to check the others that may be incomplete or inaccurate in some way. This will probably be the next thing I focus on, when I return to pwk.

unused literary sources.

pwbib_input_unused.txt contains a handful (38) literary sources that were identifed as sources used in pwk, but which have no instances in our digitization. Not sure what to make of these.

Andhrabharati commented 2 years ago

@funderburkjim Glad to see my work being useful to you (& used).

Just browsed through the unused literary sources file posted by you, and here are my initial findings-

6002 Bydragen. Bydragen. Bydragen (tot de Taal-, Land-en Volkenkunde van Nederlandsch Indie) (H. KERN). ; present at <L>108149, as Bijdragen.

1238 SADDH.P.4. Saddh.P.4. das 4te Kapitel des SADDHARMAPUṆḌARĪKA, lithographirt in Parabole de l'Enfant egare. Par. PH. ED. FOUCAUX. Paris. 1854. ; Present as "<ls>SADDH. P.</ls> <ln>4," at 5 places

1089 gaṇa. Gaṇa. gaṇa im Gaṇapātha zu P. ; present as <is>gaṇa</is> throughout the text.

5014 ŚĀŚVATA Śāśvata ŚĀŚVATA'S ANEKĀRTHASAMUCCAYA, herausgegeben von THEODOR ZACHARIAE. ; Present as <ls>ŚĀŚVATA</ls> at 51 places

[These are picked up from the pw_AB_08.txt]

Andhrabharati commented 2 years ago

I wish it is @gasyoun; but it is of much later time (1888) than PWG or pwk.

Keep guessing, as the reference to R. ed. Bomb. has first appeared in PWG Theil IV (1865).

And Nirnaya Sagar Printing Press was borne in 1867, and KP Parab entered the NS team much later!!

@gasyoun,

Giving out the info-

Both the widely cited Ramayana and Mahabharata editions of Bombay refer to "Ganapat Krishnaji editions", Printed at Gujarati Printing Press (founded in 1805 A.D.), Bombay.

Also, have a look at- http://www.hindupedia.com/en/Talk:Gujarati_Printing_Press

Andhrabharati commented 2 years ago

will facilitate using his version in other tasks, such as abbreviation markup, general punctuation, and perhaps other innovations he has introduced.

I did not do much work in pwk (just about 3-4 days were spent to arrive at this form), as I had discontinued the task abruptly (!).

So not much additional use expected in this, other than ls marking.

gasyoun commented 2 years ago

abbreviation markup, general punctuation

We bow to @Andhrabharati and @funderburkjim cooperation.

6000 'orphan' or 'naked' literary source references e.g.

Is there a list of them?

a handful (38) literary sources that were identifed as sources used in pwk, but which have no instances in our digitizatio

I've seen such cases in Sanskrit-Russian dictionary as well, 20 unused abbreviations.

Ramayana and Mahabharata editions of Bombay refer to "Ganapat Krishnaji editions", Printed at Gujarati

Good to know.

funderburkjim commented 2 years ago

Is there a list of orphans ?

Most of them can be found by regex <ls>[0-9]. There are also a few where ls begins with <ls>II and a few that begin with a page number <ls>S\. [ but there is also non-orphan <ls>S. S. S.]

I think @thomasincambodia found a comment where Boehtlingk says that some (unknown) number of these 'orphans' are to be interpreted as references to his Chrestomathy. --- Have not yet explored that.

gasyoun commented 2 years ago

'orphans' are to be interpreted as references to his Chrestomathy. --- Have not yet explored that.

Interesting idea. Wonder what edition of it. https://catalog.libfl.ru/Record/BJVVV_823711 1909 not for sure, but I have a good scan of it now.

maltenth commented 2 years ago

@gasyoun here is Boehtlingk's statement in the preface of pwk, vol 1.

"Two numbers without the name of a book refer to the second edition [1877] of my Chrestomathie."

image

The 1909 edition of the Chrestomathie is by Garbe, who added material. He does not mention the compatibility with pwk. Therefore it has to be checked whether the references in pwk can be used for this edition.

It is probably better to use the 1877 edition, of which a very good scan is freely available at the Bayerische Staatsbibliothek: https://opacplus.bsb-muenchen.de/title/BV012357417

Andhrabharati commented 2 years ago

pwbib_input_unused.txt contains a handful (38) literary sources that were identifed as sources used in pwk, but which have no instances in our digitization. Not sure what to make of these.

few more instances noted-

06001.1 AUCITY. Aucity. AUCITYĀLAṂKĀRA (R. PISCHEL). ; This can go with <L>22767.

2005 BÜHLER,Rep.1872-73. Bühler,Rep.1872-73. BÜHLER, Report on Sanskrit Mss. 1872--73. ; Present as "<ls>BÜHLER, Rep. S." at 3 places

3012 GAṆITA,MADHYĀM(ĀDHYĀYA). Gaṇita,Madhyām(Ādhyāya). GAṆITA,MADHYĀM(ĀDHYĀYA)KERN. [unknown literary source] ; <L>52136 has just the "<ls>MADHYAM. 22.</ls>" to be padded with "GAṆIT." from the preceding ls item.

gasyoun commented 2 years ago

few more instances noted

3 good catches

Andhrabharati commented 2 years ago

A good finding, @funderburkjim !!

Many of the

unused literary sources.

pwbib_input_unused.txt contains a handful (38) literary sources that were identifed as sources used in pwk, but which have no instances in our digitization. Not sure what to make of these.

are seen to be present in SCH; thus, these are all from the VN pages of pwk.

This prompts for a looking at pwk VN pages digitisation/generation, to complete the task fully.

gasyoun commented 2 years ago

are seen to be present in SCH

Would never even think about it!

Andhrabharati commented 2 years ago

These all belong to actual pwk VN pages only; those pages having been skipped in digitization, being present in the SCH, has led to this statement!!

Andhrabharati commented 2 years ago

@funderburkjim,

Many of the

unused literary sources.

are seen to be present in SCH; thus, these are all from the VN pages of pwk.

I had finished working with SCH to generate the pwk-VN data. [Would be posting the summary of SCH study, in due course.]

All instances of the "unused literary sources" except "Harisv." are now traced, either in pwk main text or in SCH (pwk VN text).

pwbib_input_unused (resolved).txt

Andhrabharati commented 2 years ago

X024R.ed. Bomb.R.ed. Bomb.R.ed. Bomb. = [unknown literary source]

Is it not Parab's @Andhrabharati ?

I wish it is @gasyoun; but it is of much later time (1888) than PWG or pwk.

Keep guessing, as the reference to R. ed. Bomb. has first appeared in PWG Theil IV (1865).

And Nirnaya Sagar Printing Press was borne in 1867, and KP Parab entered the NS team much later!!

This is what Hermann Jacobi says in his Ramayana concordance (DAS RÂMÂYANA. GESCHICHTE UND INHALT NEBST CONCORDANZ DER GEDRUCKTEN RECENSIONEN, 1893)-

  1. Die verbreitetste Recension, die mehrfach in Indien gedruckt worden ist (unter andern zweimal in Bombay, 1859 und 1888), ist diejenige, welche Schlegel die nördliche Recension oder die der Commentatoren genannt hat. Da sie aber auch die in Südindien übliche ist, und ihr erster Commentator Kataka dem Süden Indiens angehört'), so ist die Bezeichnung nördliche Recension nicht zutreffend; ebenso ist der zweite Name (Commentatoren-Recension) irreleitend, weil auch die Bengalische Recension Erklärer gefunden hat. Wir bezeichnen diese Recension mit C (wobei man an den Namen Commentatoren-Recension denken mag)Wir bezeichnen diese Recension mit C (wobei man an den Namen Commentatoren -Recension denken mag) citiren nach der zweiten Bombayer Ausgabe (Bombay, Nirņaya Sâgara Press 1888).¹⁾

  2. Die Bengalische Recension, die uns in Gorresio's Ausgabe vorliegt. Wir bezeichnen sie mit B.

    ¹⁾ Die zweite Bombayer Ausgabe ist als ein revidirter Abdruck der ersten Bombayer vom Jahre 1864 anzusehn. Die ältere Calcuttaer Ausgabe, nach der Muir citirt, ist mir nicht zugänglich. Dagegen kenne ich Buch I-IV einer jüngeren von Pratap Chandra Roy gratis verteilten, Calcutta 1881, die, soweit ich verglichen habe, mit der Bombayer Aus gabe übereinstimmt.

So, it turns out that the Nirnaya Sagara print (1888) is a revised edition of the Ganapat Krishnaji ed. (of 1864).

Similarly the Mahabharata ed. of Ganapat Krishnaji (1877) was revised by Gopal Narayana & Co. (in 1901). [Interestingly, both these revisions took place exactly after 24 years from the earlier editions.]

gasyoun commented 2 years ago

Similarly the Mahabharata ed. of Ganapat Krishnaji (1877) was revised by Gopal Narayana & Co. (in 1901).

Can you draw a mind-map?

Andhrabharati commented 2 years ago

you mean to map (put on paper) all the info in my mind?

gasyoun commented 2 years ago

you mean to map (put on paper) all the info in my mind?

Right, like https://www.biggerplate.com/mindmaps/JhsZmqkL/worlds-of-english-history-of-english-timeline

infographic-mind-map-options-modern-business-timeline-graphic-report-element-68856954

Andhrabharati commented 1 year ago

@funderburkjim

Just recalled that there was just one item remaining in the "unused literary sources" (as listed by you), while I was tracing them last year. https://github.com/sanskrit-lexicon/PWK/issues/79#issuecomment-1003511866

Now upon looking for the "zu ŚAT. BR." (as it is known that Harisvāmin is a commentator on Sat.Br.), found the occurrence at

<L>3663<pc>1043-1<k1>anavakASita<k2>anavakASita<e>100
{#अनवकाशित#}¦ <lex>Adj.</lex> {%zu den <is>Avakāśa</is> genannten Sprüchen nicht zugelassen%} <ls>HARIV. zu ŚAT. BR.</ls> <ln>4,5,6,5.</ln>
<LEND>

Here the Hariv. is a typo for Harisv.

image

Thus, it turns out that all the listed "unused literary sources" are very much present in the pwk pages; they are just missed in CDSL text, either due to (a) typo/print errors, (b) overlooking or (c) because the VN pages were left in the digitisation process those days.

[for KA, @maltenth ]

Andhrabharati commented 3 months ago

@funderburkjim

I presume this issue is now closable.