sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

Boehtlingk literary source sequences #80

Open funderburkjim opened 2 years ago

funderburkjim commented 2 years ago

One interesting detail of Böhtlingk's presentation of literary source references is the way he presents sequences of references. Examples may be drawn from listls1_pw_detail.txt (See #79).

Consider Rāmāyaṇa, abbreviated as 'R.' in pwk (search for '; R.' in the file above.). There are 820 instances identified. A specific reference is identified by three numbers: kāṇḍa (section),chapter,verse e.g. 'R. 1,1,12.' Frequently, Böhtlingk presents a sequence of several specific references. Some of these are simple to interpret, such as R. 1,28,11. 2,72,46. 3,51,5. 5,23,28. where 4 verses, all from different sections are presented. This example illustrates that the sections appear in increasing order (1 2 3 5).

Consider R. 1,16,19. 31. Here the first reference is complete (has all three parameters). The second reference 31 is to be understood as 1,16,31 (i.e. verse 31 of chapter 16 of section 1). Note that '19' is less than '31'. Also note that that commas separate the full reference '1,16,19.' and that this first reference ends with a period and that the '31' is separated from the previous reference by a space -- Punctuation and spaces are picky but important.

Consider R. 1,2,36. 33,22. Here the first reference is complete (3 numbers) but the second reference is incomplete (only 2 numbers). This second reference is to be interpreted as '1,33,22' - same section as previous reference, but a different chapter and verse. Also note that the new chapter (33) occurs after the chapter of the preceding.

Consider R. 3,25,18. 6,53,7. 55,1. 68. with 4 references. The first reference from section 3, and second from section 6, so they must be given fully. The 3rd reference decompresses to '6,55,1', occurring in a chapter after chapter 53. And the fourth reference occurs in verse 68 of chapter 55 of section 6, so it decompresses to 6,55,68.

This method of presenting sequences of references in compressed form is almost always followed, whatever the literary source. For an example from another source, consider KATHĀS. 19,30. 21,52. 54,81. 66,116. 119. from Kathāsaritsāgara.

funderburkjim commented 2 years ago

use for error detection

Quite a few errors can be identified because they deviate from the form described above.

One kind is due to the wrong number of parameters for a given literary source. For instance, consider R. 1,34,32,33. from listls1 file. This shows 4 parameters for Rāmāyaṇa, but there should be only 3. We can 'guess' that this should be a compressed sequence of 2 references R. 1,34,32. 33. and this is confirmed by the printed text (for kzAnta, page 2123-2): image

I went through a previous version of this list of 75000 looking for such anomalies, and corrected about 1000, but obviously missed this one. Maybe someone should go through listls1 detail file again and collect the remaining anomalies for further correction.

Here's another kind of deviation: MBH. 1,152,26.3,62,15. Here the error is no space after the period.

And another : M. 2,18.3,256,7,200. Here, after adding a space after the period we have M. 2,18. 3,256,7,200 but this is still wrong, because the first term 2,18 has 2 parameters, but the second 3,256,7,200 has 4 parameters. So that middle comma should be a period. Thus we can guess M. 2,18. 3,256. 7,200 as the correct markup. And this is confirmed (pUrvokta, page 4115-2): image

funderburkjim commented 2 years ago

A few exceptions

A very few exceptions to this compression rule have been noticed. For example under peraRi, page 4121-1 S. S. S. 272. 238. (an exception to the increasing ordering; the expected form is `S. S. S. 238. 272. image

Such exceptions could be derived from the listls1_pw_detail file. Current understanding attaches no significance to these exceptions.

funderburkjim commented 2 years ago

software

test_decompress.py is a Python program that checks a given sequence is 'well-formed' and'valid', and tries to decompress it. Examples:

$ python test_decompress.py '3,25,18. 6,53,7. 55,1. 68.'
True [[3, 25, 18], [6, 53, 7], [55, 1], [68]]
True [[3, 25, 18], [6, 53, 7], [6, 55, 1], [6, 55, 68]]

 python test_decompress.py '272. 238.'
True [[272], [238]]
False [[272], [238]]

The functions can be used to examine existing reference sequences, and kick out exceptions as correction candidates.

Since Böhtlingk uses the same compression in PWG, this will help when we tackle similar literary source markup improvement in that work.

gasyoun commented 2 years ago

Maybe someone should go through listls1 detail file again and collect the remaining anomalies for further correction.

What to look for? Examples similar to yours?

The functions can be used to examine existing reference sequences, and kick out exceptions as correction candidates.

I'm in love with what I see. We are working on an Atharvaveda and Rigveda update on our end. It will take half a year more to finalise.

funderburkjim commented 2 years ago

Examples similar to yours?

Yes, that's what I was thinking of.