sanskrit-lexicon / PWG

Boehtlingk und Roth Sanskrit Wörterbuch, 7 Bände Petersburg 1855-1875
0 stars 0 forks source link

LS markup for ṚV. #38

Closed funderburkjim closed 2 years ago

funderburkjim commented 2 years ago

This comment describes work to improve the markup of the literary source references in PWG for RgVeda. Before the work, 18687 ṚV. were marked in 10288 entries of PWG. After the work, 54442 ṚV. were marked in 10365 entries of PWG.

The work files are here.

funderburkjim commented 2 years ago

Changes were made in a total of 21787 lines (out of 1149414) lines in the Cologne digitization pwg.txt. The file changes.txt shows these changed lines. (refer to the 'here' link above)

lsextract_RV_00.txt is a summary of the literary source references (for RV) before the changes. lsextract_RV_03.txt is a summary after the changes.

funderburkjim commented 2 years ago

standard forms

The standard form for a literary source reference for RgVeda verse is ṚV. x, y, z. where x, y, and z are digit sequences. The markup of pwg.txt appears as <ls>ṚV. x, y, z.</ls>

A secondary standard form omits the verse: <ls>ṚV. x, y</ls> and is used to refer to a specific hymn, such as when describing some who is the author of a hymn.

A third standard form has no identifying mandala, hymn, verse numbers. There are about 400 of these, which are listed in file lsfilter_RV_0.txt.

Before these changes, there were many RV references whose coding in pwg.txt was irregular (i.e., not one of the above standard forms). These are shown in lsfilter_RV_irreg_00.txt. After the changes, there are only a handful, shown in lsfilter_RV_irreg_03.txt.

reference sequences

The typical 3-parameter standard form is, in the printed text, often presented in a compressed form. This compressed form omits the ṚV. abbreviation, and may also omit either the mandala or both the mandala and hymn. The current work uses a markup variation for these compressed forms.

funderburkjim commented 2 years ago

Examples of inferred references

Example 1

; <L>53763<pc>5-0169<k1>Baga<k2>Ba/ga
; A simple sequence -- We add `<ls n="ṚV.">` to the second instance, so
; it is complete, and can be recognized by other programs, such as the display programs.
529337 old <ls>ṚV. 2, 27, 1. 7, 41, 2.</ls> 
529337 new <ls>ṚV. 2, 27, 1.</ls> <ls n="ṚV.">7, 41, 2.</ls> 

Now the displays have a link to 7, 41, 2 as well as to 2,27,1 .

Example 2

Again with headword Baga,

529346 old <ls>ṚV. 7, 41, 1. fgg.</ls> {#Bago^ viBa\ktA Sava\sAva\sA ga^mat#} 
529346 new <ls>ṚV. 7, 41, 1.</ls> fgg. {#Bago^ viBa\ktA Sava\sAva\sA ga^mat#} 
;
529347 old <ls>5, 46, 6. 49, 1.</ls> {#Baga^Sca dAtu\ vArya^m#} 
529347 new <ls n="ṚV.">5, 46, 6.</ls> <ls n="ṚV. 5,">49, 1.</ls> {#Baga^Sca dAtu\ vArya^m#} 

Note that the first instance here (5,46,6) is inferred to by ṚV. because the previous ls reference is explicitly ṚV.. Also the second instance (49,1.) is further inferred to be 5,49,1. The reference 5,49 rvlink confirms a usage of Baga.

Example 3

Here is an example where the mandala and hymn are inferred in the markup, again with Baga


; previous line 
529382 new <ls n="ṚV.">3, 30, 18.</ls> {#A no^ Bara\ Baga^mindra dyu\manta^m#}
; 3, 30 inferred in next line
529383 new <ls n="ṚV. 3, 30,">19.</ls> <ls n="ṚV.">1, 24, 4.</ls> {#tvaM so^ma ma\he Baga\M tvaM yUna^ ftAya\te . dakza^M daDAsi jI\vase^#} 
; also 1,24,4 is another ṚV. verse illustrating Baga.
`
funderburkjim commented 2 years ago

Some ideas for next steps

markup enhancement for AV. and P.

RV was chosen because we have rvlinks and because ṚV. references are so frequent in PWG.

General ls markup improvements

There are many places where text or abbreviations are included within the scope of <ls>X</ls>. For instance 11098 matches in 10639 lines for "<ls>[^<]*fgg?[.]"
These could be improved programmatically.

funderburkjim commented 2 years ago

@Andhrabharati mentioned work he has been doing related to LS markup (https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-877602979).

The work on RV markup described in this issue was well underway at the time of that post, so I decided to carry it to completion.

However, before doing further work, we should see how his work can be used.

gasyoun commented 2 years ago

I was waiting 10 years and 10 days for it. The deep algo.

RV was chosen because we have rvlinks and because ṚV. references are so frequent in PWG.

And it's so practical too!

Some ideas for next steps

Should it be easier now or a lot of manual actions still required?

funderburkjim commented 2 years ago

I think this issue closeable. Feel free to reopen if you think it necessary.

gasyoun commented 2 years ago

I think this issue closeable

Agree, as even Atharvaveda and Panini has been started.