Closed funderburkjim closed 7 years ago
Made a 'pwg_ls' folder at the top level of repository. This will hold the work pertaining to the pwg bibliography. The intention is to have a close formal relation between pwg_ls and the directory 'pw_ls' of the PWK repository.
Made stub files and folders under pwg_ls:
It probably would be a good idea for someone (@gasyoun ?) to proofread the digitization (pwgbib1_utf8.txt) right at the start.
@funderburkjim 600 of non-IAST is hard. I can proofread, but only at least pseudo-IAST, please.
@gasyoun I've spent today trying to prepare a good IAST version of pwgbib1 for you to work with. It is here.
My aim was to convert to 'modern' IAST, which of course differs in several details from the scheme in the printed PWG bibliography.
Details on the conversion are in this readme.md; it is best to look at the 'raw' file.
Thanks for help in proof-reading.
https://github.com/sanskrit-lexicon/PWG/blob/master/pwg_ls/pwgbib/digitization/pwgbib23_roman.txt
Proofread pwgbib23_roman
, fixed several minor mistakes mostly convertion errors). Ready.
@gasyoun Glad you caught these. Things like 'English' and 'Patanjali' were errors where the transcoding was misapplied. There's no good way to catch such cases other than by human intervention, such as you applied. I'll be on lookout for similar misapplications in the other parts of the bibliography.
Thomas found a few bibliographic entries in volume 4.
pwgbib4_roman has these in IAST.
Did a first proofread of this.
No scan image available at this moment.
Bharatiya-UpasargarthaChandrika-P1-1976.pdf
About MBh. and R. quote verification:
@funderburkjim,
There is a lot of material spread in this issue. Time to write a summary and organize something in single comment / file?
There is a lot of material spread in this issue.
Not that much, actually :1st_place_medal:
@drdhaval2785
Here's a first take on what is required to make the pwg literary source links possible.
What we are aiming for is to duplicate for PWG the final step as in the displayprep for PW directory. Once we have a sortbib.txt for PWG, the Cologne server display logic should have what it needs.
The pwg_ls/pwgbib/digitization directory in this PWG repository has the work that has been done on the digitized bibliographies for PWG. They are arranged as:
The readme in this PWG digitization directory has links to the scans from which the digitizations were prepared.
For each of these, there are several forms:
I think the three X_roman.txt files are the only relevant one.
The task thus resolves into writing one or more programs to parse the X_roman.txt files, and construct a file like sortbib.txt from these inputs.
There may be other issues that arise, but this looks like a reasonable summary of steps.
Note: The extensive work we did in correlating the actual PWK literary source references to the printed references could also be done for the PWG references. However, it seems better to defer this work, which will likely have a degree of complexity for PWG as it had for PWK. Let's focus on making use of the digitized printed references for PWG, as described above.
@drdhaval2785 If you work on this, I suggest you add a displayprep directory and do the construction of sortbib.txt for PWG therein.
degree of complexity for PWG as it had for PWK. Let's focus on making use of the digitized printed references for PWG, as described above.
Indeed. So it's more about coding, than real research at this state.
So Thomas has nothing done for vol 5, 6, 7?
NYĀYAMĀLĀV. = NYĀYAMĀLĀVISTARA, nach Anführungen bei MUIR, Sans- [Page1219-1b+ 21] krit Texts.
For reference purposes [Page1219-1b+ 21] has no meaning, I guess.
@drdhaval2785 Are you planning to work on this ? If not, I think I may tackle it next week.
@drdhaval2785 Since you haven't commented, I'm assuming that you are involved with other things, like stardict.
So, I'll begin working to get literary source links for PWG.
I am sorry to have kept this unanswered. Please go ahead. I will not be able to handle it now.
pwgbib14 contains the digitization of the literary source textual material for PWG. It has been formatted with the aim of making correspondences to the actual literary source instances within the pwg.xml digitization.
The digitization readme directory describes the details of coding of pwgbib14 (at the bottom of readme).
The <HI code="xxx">
lines indicate the different entries; there are 426 of these.
By contrast there are on the order of 9000 different actual proper reference forms in pwg.xml.
So, the next task is to make a good first approximation to matching actual forms to the codes in pwgbib14.
That first approximation is now available in the PWG displays. Check it out!
Here is a brief summary of the approach taken:
<ls>
elements to pwg bibliography records. There is a lot of approximation going on here.
<ls>
in pwg.xml<ls>
text begins with numbers, parentheses, etc.P. 3, 1, 134.
to P.
<ls>
elements that are matched. e.g. <ls n="1.230">P. 3, 1, 134.</ls>
indicates that this ls-element
refers to the 230th entry of volume 1 PWG BIbliography, namely PĀṆINI'S acht Bücher grammatischer Regeln (GILD. Bibl. 244)
.<a name="1.230">
for Panini.
If you now click on the first link, to P. 3, 1, 134.
, then a window pops up for the PWG bibiliography,
scrolled to the right spot:
The main work remaining to be done is to improve the coverage of the matching. This will involve making corrections to PWG, which the work just described has not addressed. I'll mention this in a new issue so we'll remember it is on our todo list.
I think this particular issue can be safely closed.
P. 3, 1, 134.
Linking to Panini (real book reference, not just the abbreviation) is a simple as https://github.com/sanskrit-lexicon/Cologne/issues/93 @drdhaval2785 will agree. There are some books where the linkig is easy. @juhnowski if you ask me, I would think about such corpora things first, UI comes next, because it's a long story and there is no quick urge.
After reading the newest documentation I can only say - if God would have forgotten how he created the Earth, Jim would write a summary on that as well. After reading it, one could redo the whole thing again and again.
The links did not worked in Chrome for me, nothing did not open. And my AdBlock kept silent.
I do not understand how to help. What exactly and in what file to do. I opened matchcrefs:
12@No Match@Comm.)@Comm.)
1@No Match@BALLANTYNE:@BALLANTYNE:
It is supposed that Comm.) is not = to Comm.) or what? Or it means that all () should get out of the match, so there should be additional cleanup?
1@No Match@Comm.) BṚH.@Comm.) BR2H.
I can hardly see what can bee done here. The only thing I can think of is that Comm.) can be connected with an abbreviation before, and not after and that the connection as it is is concidental.
1@Match ~2 1.033 BENF. Chr.@BENFEY verbessert hat). SUŚR.@BENFEY verbessert hat). SUC2R.
1@Match ~2 1.033 BENF. Chr.@BENFEY annimmt) DAŚAK. in BENF. Chr.@BENFEY annimmt) DAC2AK. in BENF. Chr.
1@No Match@Auge SUŚR.@Auge SUC2R.
1@No Match@Ausg.@Ausg.
Ausgabe = Edition
verbessert hat
and annimmt
is not part of the abbreviation, is just a German text.
Regarding PWG links not working in Chrome.
I'm also using chrome, and the links work fine. I also have an ad blocker (UBlock Origin). Maybe open developer tools and see if there is any reason given.
Regarding 'Comm.,.' in matchcrefs. This may be referring to an un-named commentary on BṚH.
.
How to handle this is unclear. Most obvious would be to just link to BṚH. ĀR. UP.
in pwgbib.txt.
If so, then one solution would be to make a correction to pwg digitization to change the scope of the
ls
tag. For example from <ls>Comm.) BṚH.</ls>
to Comm.) <ls>BṚH.</ls>
.
While the details of the change you could leave to me, you could help by indicating what the link should point to.
The BALLANTYNE instance is different. The only mention of this author in pwgbib is as editor of PAT. YOGAŚ. . SO, it might be that that should be the link.
In terms of priorities, I would use the 'count' field as a guide. For instance, it would make sense to find solutions for those 35 'No Match' cases where there are 100+ instances.
If you are actively wanting to work on these, maybe you should have the ancillary datafiles in the
abbrvoutput directory that I have thus far excepted from Git coverage. For instance, the abbrvlist file has
every <ls>
instance, in dictionary order, and includes the headword and L-number; thus, with this
one could examine the context of the ls-element,. Please advise if you need this now. Total size of abbrvoutput directory is 32MB.
one solution would be to make a correction to pwg digitization to change the scope of the ls tag.
I would go for it.
The BALLANTYNE instance is different. The only mention of this author in pwgbib is as editor of PAT. YOGAŚ. . SO, it might be that that should be the link.
Makes sense.
it would make sense to find solutions for those 35 'No Match' cases where there are 100+ instances.
Can you order them in order of priority, please?
Total size of abbrvoutput directory is 32MB.
Please share it.
abbrvoutput directory now uploaded.
Can you order them in order of priority?
Sure: see the discussion in #22 re matchrefs, and the regex therein. Edit the matchcrefs file locally, select the lines with the regex. There are 35 selected lines of No Matches. Order these 35 by size of first field (count). There are a few with 1000+ --- these are first to examine and resolve.
Jim, I'm too dumb. I do not get it.
185 No Match Sp. Sp.
What's wrong with Sp.? Sp stands for Spruche
, a well-known book, so what to do with it? How to make it match?
There's may be nothing wrong with Sp. The problem likely is that the PWG bibliography pwgbib.txt does not have this reference which you recognized.
The solution in such a case would be to generate a synthetic new entry for pwgbib.
It is also possible that we might find some items missing from the pwg bibliography are present in the pwk bibliograph (eg, in sortbib).
Ok, so 1) checked pwgbib.txt (none) 2) checked sortbib.txt (none).
What do I do next? I know that Sp. = https://www.worldcat.org/title/indische-spruche-sanskrit-und-deutsch-herausgegeben-von-o-bohtlingk/oclc/557531710&referer=brief_results is meant. Where and how to note it? I will start one by one, but I still do not get how to work in batch mode.
Different case, where I see a match.
pwgbib.txt has none
1.314 <HI code="Verz. d. B. H.">Verz. d. B. H. = WEBER'S Verzeichniss der Berliner Sanskrit-Hand-<lb>schriften. Bildet den ersten Band von: Die Handschriften-Verzeich-<lb>nisse der Königlichen Bibliothek, herausgegeben von dem König-<lb>lichen Oberbibliothekar, Geheimen Regierungsrath Dr. PERTZ. Berlin<lb>1853. 8º. 1.315 <HI code="Verz. d. Kopenh. H.">Verz. d. Kopenh. H. = WESTERGAARD'S Verzeichniss der Kopenhagener<lb>Sanskrit-Handschriften in: Codices orientales bibliothecae regiae<lb>Havniensis yussu et auspiciis regis Daniae augustissimi Christiani<lb>octavi enumerati et descripti. Pars prior, codices indicos continens.<lb>Havniae 1846. 4º. 1.316 <HI code="Verz. d. Pet. H.">Verz. d. Pet. H. = BÖHTLINGK'S Verzeichniss der Petersburger Sanskrit-<lb>Handschriften in: DORN, das Asiatische Museum der Kais. Akad. der<lb>Wiss. St. Petersburg 1846. S. 720. fgg.
sortbib.txt has 1
Verz.d.Oxf.H 1291 AUFRECHT, Verzeichniss der Oxforder Handschriften.
Only Verz.d.Oxf.H
has no dot at the end (matchrefs
has it like Verz.d.Oxf.H.
), I guess that matters?
S.
and Z.
S. (Seite) = page. Z. (Zeile) = line. They are no real entities, but should be added as sub entities and marked as such. Jim, should we add a new level to reference entries? Like the page and line where the quote occurs?
pwgbib.txt has none
1.334 <HI code="Z. d. d. m. G.">Z. d. d. m. G. = Zeitschrift der Deutschen morgenländischen Gesell-<lb>schaft. Leipzig. 1.335 <HI code="Z. f. d. K. d. M.">Z. f. d. K. d. M. = Zeitschrift für die Kunde des Morgenlandes. Göt-<lb>tingen (Bd. I--III) und Bonn (Bd. IV--VII). 1.336 <HI code="Z. f. d. W. d. Spr.">Z. f. d. W. d. Spr. = Zeitschrift für die Wissenschaft der Sprache.<lb>Herausgegeben von Dr. A. HOEFER. 1.337 <HI code="Z. f. vgl. Spr.">Z. f. vgl. Spr. = Zeitschrift für vergleichende Sprachforschung auf dem<lb>Gebiete des Deutschen, Griechischen und Lateinischen herausgege-<lb>ben von Dr. THEODOR AUFRECHT und Dr. ADALBERT KUHN. Berlin.
sortbib.txt has both
S. (Seite) und Z. (Zeile)
@gasyoun This issue is closed. Let's move this discussion to #22.
In preparation for the pwg blbligraphy, I reorganized the top level of the repository to have two folders: