sanskrit-lexicon / PWG

Boehtlingk und Roth Sanskrit Wörterbuch, 7 Bände Petersburg 1855-1875
0 stars 0 forks source link

PWG ls markup: numeric orphans #65

Closed funderburkjim closed 1 year ago

funderburkjim commented 1 year ago

This work aims primarily to improve markup in cases like:

OLD:
<ls>M. 2, 4.</ls> {#yo 'kAmAM dUzeyatkanyAm#} 
<ls>8, 364.</ls> {#tannAkAmo dAtumarhati#}     <<< <ls>8,364.</ls> is a numeric orphan.
NEW:
<ls>M. 2, 4.</ls> {#yo 'kAmAM dUzeyatkanyAm#} 
<ls n="M.">8, 364.</ls> {#tannAkAmo dAtumarhati#}   <<< infer that 8,364 must be 'M."

When we get a link target for 'M.' (Manu's law book), then we will be able to provide a link to the 8, 364 reference.

Sometimes these orphans are hidden in long strings of numbers, such as

OLD:
<ls>VS. 1, 12. 16. 31. 11, 30.</ls>
NEW:
<ls>VS. 1, 12.</ls> <ls n="VS. 1,">16.</ls> <ls n="VS. 1,">31.</ls> <ls n="VS.">11, 30.</ls>

The current work handles many of these hidden numeric orphans, but only incidentally. The focus of this work is on the (unhidden) numeric orphans. Additional work (with somewhat different techniques) will be required to resolve the hidden numeric orphans.

funderburkjim commented 1 year ago

The work is done in lsnum1 directory. Before this work, 32113 matches in 31909 lines for "<ls>[0-9]" After 5 days, 29072 matches in 28887 lines for "<ls>[0-9]"

So about 10% done.

change_pwg_1.txt has the first batch of changes.

Estimate about 3 months to complete!

funderburkjim commented 1 year ago

A program was written to automate some changes. About 8000 such changes made. See change_pwg_2.txt.

About 21000 type 1 orphans remain.

funderburkjim commented 1 year ago

change_pwg_3.txt has another 5000 changes. About 16000 type 1 orphans remain.

gasyoun commented 1 year ago

PWG is top 5, so no work on it's markup can't be too much, thanks, Jim!

Estimate about 3 months to complete

Huge one.

funderburkjim commented 1 year ago

change_pwg_4.txt has another 5000+ changes. About 9700 type 1 orphans remain.

funderburkjim commented 1 year ago

change_pwg_5.txt has another 4000+ changes. About 6000 type 1 orphans remain.

funderburkjim commented 1 year ago

change_pwg_6.txt has another 4000+ changes. About 4000 type 1 orphans remain.

funderburkjim commented 1 year ago

change_pwg_7.txt has another 3600+ changes. About 1500 type 1 orphans remain.

funderburkjim commented 1 year ago

Now there remain only a handful of type 1 numeric orphans.

It is certain that, in the reduction of cases from the initial number of 32000 to the current 58, some errors were made.

In this exercise, many other markup deficiencies were noticed, and hopefully will be addressed in the future. But for now a break from this tedious work is in order. Closing this issue.

Andhrabharati commented 1 year ago

is there a way that someone else can access the "temp_xxx" file(s), @funderburkjim ?

I would like to see if I can be of some help on the 22 cases mentioned in the readme file--

File temp_lsnum1_1.txt shows 81 items
- 59 are unresolved <ls>{number} -- nothing more to do with these
- the rest (22) are errors in ls
funderburkjim commented 1 year ago

@Andhrabharati those 22 were resolved by me. See the last two sections (temp_change_pwg_8x and 8x1) of change_pwg_8.txt

gasyoun commented 1 year ago

those 22 were resolved by me.

But still others left @funderburkjim ?