Closed funderburkjim closed 2 years ago
Changes were made in a total of 21787 lines (out of 1149414) lines in the Cologne digitization pwg.txt. The file changes.txt shows these changed lines. (refer to the 'here' link above)
lsextract_RV_00.txt is a summary of the literary source references (for RV) before the changes. lsextract_RV_03.txt is a summary after the changes.
The standard form for a literary source reference for RgVeda verse is ṚV. x, y, z.
where x, y, and z are digit sequences. The markup of pwg.txt appears as <ls>ṚV. x, y, z.</ls>
A secondary standard form omits the verse: <ls>ṚV. x, y</ls>
and is used to refer to a specific hymn, such as when describing some who is the author of a hymn.
A third standard form has no identifying mandala, hymn, verse numbers. There are about 400 of these,
which are listed in file lsfilter_RV_0.txt
.
Before these changes, there were many RV references whose coding in pwg.txt was irregular (i.e., not one of the above standard forms). These are shown in lsfilter_RV_irreg_00.txt. After the changes, there are only a handful, shown in lsfilter_RV_irreg_03.txt.
The typical 3-parameter standard form is, in the printed text, often presented in a compressed form. This
compressed form omits the ṚV.
abbreviation, and may also omit either the mandala or both the mandala and hymn. The current work uses a markup variation for these compressed forms.
; <L>53763<pc>5-0169<k1>Baga<k2>Ba/ga
; A simple sequence -- We add `<ls n="ṚV.">` to the second instance, so
; it is complete, and can be recognized by other programs, such as the display programs.
529337 old <ls>ṚV. 2, 27, 1. 7, 41, 2.</ls>
529337 new <ls>ṚV. 2, 27, 1.</ls> <ls n="ṚV.">7, 41, 2.</ls>
Now the displays have a link to 7, 41, 2 as well as to 2,27,1 .
Again with headword Baga,
529346 old <ls>ṚV. 7, 41, 1. fgg.</ls> {#Bago^ viBa\ktA Sava\sAva\sA ga^mat#}
529346 new <ls>ṚV. 7, 41, 1.</ls> fgg. {#Bago^ viBa\ktA Sava\sAva\sA ga^mat#}
;
529347 old <ls>5, 46, 6. 49, 1.</ls> {#Baga^Sca dAtu\ vArya^m#}
529347 new <ls n="ṚV.">5, 46, 6.</ls> <ls n="ṚV. 5,">49, 1.</ls> {#Baga^Sca dAtu\ vArya^m#}
Note that the first instance here (5,46,6) is inferred to by ṚV. because the previous ls reference is explicitly ṚV.. Also the second instance (49,1.) is further inferred to be 5,49,1. The reference 5,49 rvlink confirms a usage of Baga.
Here is an example where the mandala and hymn are inferred in the markup, again with Baga
; previous line
529382 new <ls n="ṚV.">3, 30, 18.</ls> {#A no^ Bara\ Baga^mindra dyu\manta^m#}
; 3, 30 inferred in next line
529383 new <ls n="ṚV. 3, 30,">19.</ls> <ls n="ṚV.">1, 24, 4.</ls> {#tvaM so^ma ma\he Baga\M tvaM yUna^ ftAya\te . dakza^M daDAsi jI\vase^#}
; also 1,24,4 is another ṚV. verse illustrating Baga.
`
RV was chosen because we have rvlinks and because ṚV. references are so frequent in PWG.
<ls>AV. 9, 9, 2. 11, 9, 7. 12, 4, 3.</ls>
<ls>P. 5, 1, 121. 7, 3, 30. 31</ls>
There are many places where text or abbreviations are included within the scope of <ls>X</ls>
.
For instance 11098 matches in 10639 lines for "<ls>[^<]*fgg?[.]"
These could be improved programmatically.
@Andhrabharati mentioned work he has been doing related to LS markup (https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-877602979).
The work on RV markup described in this issue was well underway at the time of that post, so I decided to carry it to completion.
However, before doing further work, we should see how his work can be used.
I was waiting 10 years and 10 days for it. The deep algo.
RV was chosen because we have rvlinks and because ṚV. references are so frequent in PWG.
And it's so practical too!
Some ideas for next steps
Should it be easier now or a lot of manual actions still required?
I think this issue closeable. Feel free to reopen if you think it necessary.
I think this issue closeable
Agree, as even Atharvaveda and Panini has been started.
This comment describes work to improve the markup of the literary source references in PWG for RgVeda. Before the work, 18687 ṚV. were marked in 10288 entries of PWG. After the work, 54442 ṚV. were marked in 10365 entries of PWG.
The work files are here.