PWG Bibliography Cleaning

funderburkjim commented 7 years ago

Issue #20 describes a good first step in matching the literary source references of the PWG dictionary with those literary sources mentioned in the PWG bibliography sections.

There remains the task of improving the matching.

Currently, of about 420,000 references about 344,000 are matched.
- the improperrefs have not been examined at all. Probably some are improper due to errors in markup.
- The matchrefs file is a good summary of the status of matched and unmatched items. This is a good source to begin looking for minor typos.
  - For example line 12 1@No Match@ĀŚR. GṚHY.@A10C2R. GR2HJ. might be spelling error for ĀŚV. GṚHY.
  - The first field is a frequency count. A search on unmatched cases with 100+ instances can be made with regex [0-9][0-9][0-9]@No Match and gives 35 cases. Some of these may be items which should be in the PWG bibliographies, e.g. 429@No Match@CARAKA@K4ARAKA may be a reference to CARAKASAM̃HITĀ, Calcutta 1929 und 1877 und Hdschr. im Besitz von ROTH (KERN und ROTH). which appears in the PWK bibliography, and we possibly should add a synthetic entry for this into the PWG bibliography so that CARAKA will match. But clearly such work will require some research and examination of instances.
There is also room for improvement within the 344,000 matches.
- Separate multiple references which appear within the same 'ls' tag. For instance, in the 'deva' example shown in #20, P. 3, 3, 121, Sch.). Vop. 26, 29. the digitization markup needs to be separated into two 'ls' elements, one for Panini, and one for Vopadeva.
- Some of the matches cover up typo errors. An examination of the cases matched based on the list in function match_special_startswith of abbrv4.py would be a good place to start.

funderburkjim commented 7 years ago

@gasyoun

You are finding some of the non-matches. Great. That's what needs to be done.

There is no definite procedure set up to integrate your results. We'll have to develop one or more.

adding a new entry to pwgbib.txt

pwgbib.txt is the file. For the 'Sp.' case, it seems we need to add an entry to pwgbib.

Here's the last line of that file:

4.018 <HI code="WEBER, Nax.">WEBER, Nax. = WEBER, Die vedischen Nachrichten von den Naxatra<lb>(Mondstationen). Berlin, 1860. 1862.

We need to add a similarly formatted line, with three parts:

vol-seq code (4.018). Maybe we use 'X' for the volume code (X for extra). So, X.001
<HI code="Sp."> That's the second part
Sp. =
- This is what will be displayed in the pop-up window.

So, the procedure you would follow is to add one or more such lines to pwgbib.txt.

Then, when you've done several, notify me. I'll have to go through some installation procedure.

funderburkjim commented 7 years ago

Verz. d. Oxf. H.

You found this in PWK.

It can be handled similarly to 'Sp.'. Now, we could just include this in the same 'X' volume. Or, we could choose to think that since it is in PWK bilbiography, we should consider this as separate category, which we might indicate by using the 'K' volume. I don't have a preference.

funderburkjim commented 7 years ago

'S.' and 'Z.' - You've found that these are not names of works. I think these should be classified, for now at least, as 'improper references'; this will defer consideration of them until a later time. For now our focus should be on the 'proper references'. This reclassification will be accomplished by me making a change to one of the abbrvx.py programs. You don't need to do anything further.

gasyoun commented 7 years ago

In PWG

<ls>Sp. 519, Z. 35. fgg.</ls>
<ls>57. 135. VIKR. 12. Spr. 2475.</ls>

In PWK

<ls>Spr.2235.2278.</ls>
<ls>Spr.7133,</ls>
<ls>Spr.4242.</ls>
<ls>Spr.236fg.</ls>
<ls>Spr.2634),</ls>

Sp. in PWG is equal to Spr. in PWK. Where to note such a concordance?
How to clean out the out of fg., ), and similar dirt?
Where is Spr. II? Not yet found in XML, remember it was around, only where?

funderburkjim commented 7 years ago

Spr. II? The most comprehensive place to look is in the 'abbrvlist.txt' file. This file has ALL the <ls> entries, in text order, without adjustments,

Sp. (PWG) == Spr (PWK) WHere to note?

We don't have a place thus far. Why don't you make a file named something like 'pwg_pwk_bib.txt` in the 'abbrvwork' directory; and enter such correspondences in that file.

How to clear out 'fgg. '

Not sure. You could possibly make an annotated list of such things as you notice them; maybe put in a file with some suggestive name.

gasyoun commented 7 years ago

Ǵjot. im Çkdr. Ind. St. Ii, 259. 278. 282. instead of Ind. St. Ii must be Ind. St. II.

Andhrabharati commented 2 years ago

@gasyoun,

I guess Sp. in PWG stands for Spalte, not for Spr. (Spruch)

For example, pl. see this page from Vol.6 of PWG Verbesserungen PWG Theil 6.pdf

Also I see that Sp. is used after Spr. in the text many times, corroborating my view.

And almost throughout the work the Sp. is followed by a Z. (indicating the line within that Sp.) number.

[Though I am not conversant with German language at all, I have just gathered a minimal knowledge in it for a week before looking into the PWG data (for necessary action).]

Andhrabharati commented 2 years ago

As I understand-- when the work under citation has no columns, S. (Seite) [Eng. page] with a line (Z. - Zeile) number is used [and the columns in them, if any, by the letters a, b, c, ...] and when the work is with columns having their own numbers [like PWG itself; another such work immediately coming to my mind is A Comparative Dictionary of the Indo-Aryan Languages by RL Turner], Sp. (Spalte) [Eng. column] is used.

Pl. correct me if my understanding is wrong.

gasyoun commented 2 years ago

I guess Sp. in PWG stands for Spalte, not for Spr. (Spruch)

Agree.

Pl. correct me if my understanding is wrong.

No correction needed, you nailed it.

gasyoun commented 2 years ago

bed

Bedeutung (= English meaning) for Bed. might not be that obvious. Do we have a list of general German words used as abbreviations @funderburkjim ?

Andhrabharati commented 2 years ago

Not a full list, but a major portion of it, as existing in PWG, was already made (marked) long back and posted recently.

@gasyoun may wish to expand them, if he really has some time (I actually mean "interest").

gasyoun commented 2 years ago

INDR. 5, 41 missing @funderburkjim

https://sanskrit-lexicon.uni-koeln.de/simple/pwg/guru

indr

gasyoun commented 2 years ago

@funderburkjim where to add the missing abbreviations, so they show up - like acc. = Accusativ.

fsfsdfsdsfd

Andhrabharati commented 2 years ago

https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-954035071

You may start "filling" the bigger ab file by me (above), or continue updating the pwab_input.txt file by Jim (in the csl_pywork repo).

gasyoun commented 2 years ago

continue updating the pwab_input.txt file by Jim (in the csl_pywork repo).

There are issues above it as well.

<ls>H. an.</ls> <ls>MED.</ls> creates a double space. Should we replace it with <ls>H. an.</ls><ls>MED.</ls>?

fsfdfsdfsdsfd

H. an. Med. -> H. an. Med.

Replace double space --> single space @funderburkjim

gasyoun commented 2 years ago

@thomasincambodia what does the Z. stand for? Zeile?

f) Z. 5. fg. गरीयसी so v. a. sehr ehrenvoll Pañcat. I, 418

gasyoun commented 2 years ago

Verz. D. Oxf. H. 255,b, N. 5.258,b,19.

The N. is a source similar to H.?

dssdffsd

maltenth commented 2 years ago

Z. = Zeile = line

fg. = folgende = following

v. a. = vor allem = above all

Andhrabharati commented 5 months ago

Verz. D. Oxf. H. 255,b, N. 5.258,b,19.

The N. is a source similar to H.?

@gasyoun

The N. denotes the Notes (Footnote) number (it is not a title of any work, like H.)--

And the Verz. D. Oxf. H. 255,b, N. 5.258,b,19. indicates two links 255,b, N. 5. & 258,b,19. [there has to be a space after N. 5.!]

Andhrabharati commented 5 months ago

@funderburkjim

I had posted the Verz. D. Oxf. H. long back for pdf-linking, in one of the issues.

It is the easiest work to link (no need for any index as such) and also one of the major cited work in PWG (in the 12th position from the top).

You might like to take up this work, as a small detour activity sometime.

funderburkjim commented 5 months ago

I had posted the Verz. D. Oxf. H. long back for pdf-linking,

Perhaps you would post the pdf download reference here.

Andhrabharati commented 5 months ago

@funderburkjim

This is the post that has the link.

Incidentally you may see the last statement in the above post; what do you interpret from it?

It just means that I had NOT ONLY finished the work as mentioned at another post, BUT ALSO collected the sources (scans etc.), in just about 2 months time (May-Jul 2021), the exercise that has been happening for over 9 years (2015-2024) [in a haphazard manner] and still far far away from the goal at CDSL!

Andhrabharati commented 5 months ago

And I had suggested taking this up, while you were on the pdf linking activity about 2 yrs ago.

funderburkjim commented 5 months ago

From the last few days, there are 15+ Github posts from @Andhrabharati requesting my attention. The number of similar but older posts is unknown, but likely greater than 0.

sanskrit-lexicon / PWG

PWG Bibliography Cleaning #22

adding a new entry to pwgbib.txt