Closed drdhaval2785 closed 8 years ago
1 ANUKRAM.zuR2V - Not found in pw.xml, neither are ANUKRAM and zuR2V separately. 2 KA7TJ(A7JANA) - This is a full form for KA7TJ used at some places like KA7TJ.C2R. 3 KA7TJ.SANA7NAS - Not found in pw.xml 4 SAM5NJ.Up -> SAM5NJ.UP, total 8 occurrences in pw.xml
5 VET.(U.) - Correct one. Our pw.xml had mismatched brackets earlier. It should have been fixed now.
6 WEBER,GJOT -> WEBER,G4JOT
7 Spr - Not found. It seems to be an extention for Beitr.
8 DEC2IN->DEC2I7N, 63 matches
9 Mat.med
There are 7 entries with Mat.Med (capital) and 400 entries with Mat.med (Small).
So I guess we should convert all Mat.Med
to Mat.med
.
EJF: Agree. from scan check of keruka and taruRa, the 'Mat.Med.' is an OCR error.
10 PISCHEL,deGr.pr Usually there are no small letters in 'ls'. Therefore our regex in crefs missed this entry. But there actually is a work referred to by this entry, rightly found by bib1.txt. Let's keep it as it is.
11 DIVJA7V<AD>
->DIVJA7VAD
Not a tag.
DIVJA7VAD has 112 matches.
12 HAEB.Anth->HAEB.ANTH Currently there is only one entry with capitals.
13 H.an->H.AN
14 LEUMANNA,Aup.Gl - Not found in pw.xml, nor are its components. @funderburkjim may reverify
LEUMANNA, Aup. Gl.
has a typo, should be LEUMANN, Aup. Gl.
.
However, I don't find 'leumann' (case-insensitive search) in pw.txt. So, this is an 'extra' reference
appearing the the volume 5 bibliography.15 KA7VJA7->KA7VJA7D; All entries are having D at the end. pwbib1.txt wrongly identified the work. It seems to be the kAvyAdarSa of daRqin.
KA7VJA7 (OKALOK4ANA), Hdschr. (AUFRECHT). (vol. 5)
and from vol. 5 bibliography, this is missing an L, it should be corrected to KA7VJA7L (OKALOK4ANA)
, and this is a work shown as MW headword का-°व्या*लोक-लोचन [p= 1324] : n. N. of a rhet. wk. by अभिनवगुप्त. [L=49808.1]. I agree that pwbib0 is missing the kAvyAdarSa . Should we add it to pwbib0 ?16 HILLEBR. It missed regex for cref, because this work has always an 'N' suffixed to it like HILLEBR.N.
17 SADDH.P.4->SADDH. P.4 is page reference. There are other page references too.
18 K4ANDRA7LOKA -> Wrong identification of work by pwbib1.txt There are only three occurrences if K4ANDRA in the whole of pw.xml.
Line 29549: <H1><h><key1>kusuta</key1><key2>kusuta</key2></h><body><gram n="m">m.</gram> <i>der Planet Mars</i> <ls>VISHNUK4ANDRA.</ls> <noti>im</noti> <gram n="Comm">Comm.</gram> <noti>zu</noti> <ls>VARA7H.BR2H.2,20.</ls> PW29547</body><tail><L>29545</L><pc>2086-1</pc></tail></H1>
Line 84050: <H1><h><key1>mah</key1><key2>ma/h</key2><hom>2</hom></h><body><divm type="e" n="1">1)</divm> <gram n="Adj">Adj.</gram> (<gram n="f">f.</gram> <noti>ebenso und</noti> <s>mahI/</s>) <divm type="n" n="a">a)</divm> <i>gross , gewaltig , mächtig , reichlich.</i> <divm type="n" n="b">b)</divm> <i>alt , bejahrt.</i> <divm type="e" n="2">2)</divm> <gram n="f">f.</gram> <s>mahI/</s> <divm type="n" n="a">a)</divm> <i>die Erde.</i> <noti>Als Bez.</noti> <i>der Zahl Eins</i> <ls>G4AN2ITA.K4ANDRAGR.3.</ls> <divm type="n" n="b">b)</divm> <i>Erdboden.</i> <gram n="Pl">Pl.</gram> <ls>SPR.1509.</ls> <divm type="n" n="c">c)</divm> <i>Boden , Grund , Land.</i> <divm type="n" n="d">d)</divm> <i>Reich.</i> <divm type="n" n="e">e)</divm> <i>Erde</i> <noti>als</noti> <i>Stoff.</i> <divm type="n" n="f">f)</divm> <i>Basis eines Dreiecks <noti>oder</noti> einer anderen Figur.</i> <divm type="n" n="g">g)</divm> <gram n="Du">Du.</gram> <i>Himmel und Erde.</i> <divm type="n" n="h">h)</divm> <i>Raum.</i> <divm type="n" ...
Line 125909: <H1><h><key1>suKArTa</key1><key2>suKArTa</key2></h><body><gram n="m">m.</gram> <i>eine Sache des Wohlbehagens , ~ der Lust.</i> <gram n="Acc">Acc.</gram> (<ls>GAN2IT.26,7.</ls><ls>K4ANDRAGRAH.24,35</ls>) <noti>und</noti> <gram n="Dat">Dat.</gram> <i>der Annehmlichkeit ~ , der Bequemlichkeit wegen , zur Erleichterung.</i> PW125903</body><tail><L>125905</L><pc>7141-3</pc></tail></H1>
In all three occurrences, 'K4ANDRA' stands for 'candra'. There is no work like candrAloka referred here. First is vizRucandra Second and third are gaRita candragr(ahaRam??)
19 SAM5KSHPAC2 - not found in pw.xml
20 DONNER,PIN2D2->DONNER,Pin2d2; This is the form in pw.xml. Otherwise it may be made capital.
21 MAHA7B It is purported to be 'mahABAzya' according to pwbib0.txt (c.f. MAHA7BH for mahABArata). But it is not found in the pw.xml.
@funderburkjim may like to comment whether this has been lost as some programmatical conversion step or what?
EJF: The original digitization from Thomas is pwbib_orig.txt. pwbib0.txt was created by a program from this, so it is certainly possible that the program did some damage. The program steps to get pwbib0 are described in this readme document. I am treating pwbib0 as the current primary document, the one that will have corrections applied to it.
EJF: I suspect this is in the list of 'extra' bibliographical references (i.e., it appears in the pw bibliographies but is not referred to within the body of the PW dictionary). We probably should develop a list of these, and make use of this list in the crefmatch program, so we won't repeatedly worry why they don't match anything. I wonder why the author B. includes them in the bibliographies.
22 C2RIMA7LA7M Not able to locate in pw.xml
23 A7RUN2.Up->A7RUN2.UP
24 KAUSH.Up->KAUSH.UP
25 VIKR<OR>
->VIKR.
pw.xml also needs to be corrected from VIKROR to VIKR (total 52 entries)
VIKR<OR>
->VIKR. correction to pwbib0. Agree with pw.xml change, with minor exception:
VIKROR.dra7v.654,24.@vA@vA@99767 no change.
26 Bydragen - not found in pw.xml
EJF. for our not-used list. Here's the pwbib entry. What does it mean?
Bydragen (tot de Taal-, Land-en Volkenkunde van Nederlandsch Indie)
27 PRATIG4N4A7S(U7TRA) refers to PRATIG4N4A7S, which is already there in crefs.
28 HARISV - Name of an author einer Tochter Harisva7min's
Line 128158: <H1><h><key1>suSIla</key1><key2>suSIla</key2><hom>2</hom></h><body><divm type="e" n="1">1)</divm> <gram n="Adj">Adj.</gram> <i>von guter Gemüthsart.</i> <ls>SPR.7140</ls> <noti>mit einer unbekannten Nebenbedeutung</noti> ; <noti>vgl.</noti> <s>suSIlavant</s> <gram n="Nom">Nom.</gram>abstr. <s>°tA</s> <gram n="f">f.</gram> <ls>KA7D.2,55,4(65,15</ls>). <divm type="e" n="2">2)</divm> <gram n="m">m.</gram> <noti>N.pr. verschiedener Personen.</noti> <divm type="e" n="3">3)</divm> <gram n="f">f.</gram> <s>A</s> <noti>N.pr.</noti> <divm type="n" n="a">a)</divm> <noti>einer Gattin Kr2shn2a's.</noti> <divm type="n" n="b">b)</divm> <noti>eines Wesens im Gefolge Ra7dha7.</noti> <divm type="n" n="c">c)</divm> <noti>der Gattin Jama's.</noti> <divm type="n" n="d">d)</divm> <noti>einer Tochter Harisva7min's.</noti> PW128152</body><tail><L>128154</L><pc>7170-1</pc></tail></H1>
<ls>
here.29 gan2a It is not a literary resource I guess. It is referring to gaRapAWa of pARini. Not able to locate it. It is always gaRaratnamahodaDi which comes up.
<H1>100{anuyuktin}1{*anuyuktin}¦ •Adj. •gan2a #{izwAdi}. PW4731
<ls>
. From Dhaval's comment, I guess no change in marking (to <ls>
) is required. 30 KA7R->KA7RIKA7 There is only one occurrence. And pw.xml has KA7RIKA7.
31 DAC2AK.(1925) It was catched because of wrong closure of brackets. Now must have gone away. No interference needed
32 MA7N2D2Up->MA7N2D2.UP.
33 SVAPNAK4(INTA7MAN2I) - not able to locate in pwbib0.txt
34 PRAKRIJA7K(AUMUDI),Hdschr.(AUFRECHT).RA7JENDR.Not->PRAKRIJA7K The rest seems to be explanation in some catalogue of Rajendra Mishra.
35 VASISHT2HA,-> not able to locate it in pw.xml. (See the comma). It is used in pwbib0 to separate two editions.
36 OppCat->OPP.CAT.
37 K4HA7NDOGJAP-> not able to locate in pw.xml. There is one K4ha7ndogjopanishad in the text.
38 KUHN'SZ->KUHN'S.Z.
Re 17 SADDH.P.4->SADDH.
I'm not sure what the story is here. Maybe @gasyoun or @zaaf2 or @thomasincambodia can help.
There are two entries in the bibliography:
I've added my two cents worth as 'subcomments' (identified by EJF) of Dhaval's initial comments in several. I've done this through his case 24 saddh. Will continue with the rest another time.
I'm using
@gasyoun Could you check this correction to pwbib0:
old: ; VIKR. dra7v. == KA7LIDA7SA'S VIKRAMORVAC2IYAM nach dra7vidischen Handschriften, herausgegeben von RICHARD PISCHERD in. , Monatsbericht der Königlich Preussischen Akademie der Wissenschaften zu Berlin"1875, S. 609. fgg. (vol. 5)
new
.VIKR. dra7v. == KA7LIDA7SA'S VIKRAMORVAC2IYAM nach dra7vidischen Handschriften, herausgegeben von RICHARD PISCHEL in "Monatsbericht der Königlich Preussischen Akademie der Wissenschaften zu Berlin", 1875, S. 609. fgg. (vol. 5)
4te Kapitel = 4th Chapter, that means that Dhaval's assumption was wrong.
17 SADDH.P.4->SADDH
not legal.
@funderburkjim RICHARD PISCHERD -> RICHARD PISCHEL
1925 = every 4 digit number starting with 18.. or 19.. should be left for further examination.
Re 25 VIKR<OR>->VIKR.
. Under headword kaYcukIya, I changed the text as follows:
old
¯VIKROR.ED. ¯PISCHEL.661,4.14.664,15.
new
¯VIKR.dra7v.661,4.14.664,15.
The reason is for consistency with the bibliography (pwbib) which shows that VIKR.dra7v is the PISCHEL edition in bibliography:
.VIKR. dra7v. == KA7LIDA7SA'S VIKRAMORVAC2IYAM nach dra7vidischen Handschriften, herausgegeben von RICHARD PISCHERD in. , Monatsbericht der Königlich Preussischen Akademie der Wissenschaften zu Berlin"1875, S. 609. fgg. (vol. 5)
3015 corrections were generated for PW, and have been installed. These are consistent with the 'EJF' comments as shown above.
Minor change to pwbib0.txt, 'Up.' -> 'UP.' These can be viewed as typos, since the printed text always has a lower case capital 'P'.
The ones marked as 'extra' were not mentioned in the issue comments above.
; A7RUN2. Up. -> A7RUN2. UP.
; DHJA7NAB. Up. -> DHJA7NAB. UP. (extra)
; KAUSH. Up. -> KAUSH. UP.
; .K4HA7ND. Up. -> .K4HA7ND. UP. (extra)
; .NI7LAR. Up. -> .NI7LAR. UP. (extra)
; NR2S Up. -> NR2S UP. (extra)
; 4 SAM5NJ.Up -> SAM5NJ.UP in pwbib0
; .TAITT. Up. -> .TAITT. UP. (extra)
; GA7R. Up. -> GA7R. UP. (extra)
; HANUM. Up. -> HANUM. UP. (extra)
; JOGAC2.Up.-> JOGAC2.UP. (extra)
; 32 MA7N2D2 Up. -> MA7N2D2 UP.
; NA7DAR. Up. -> NA7DAR. UP. (extra)
; TEG4OB. Up. -> TEG4OB. UP. (extra)
; MUN2D2. Up.-> MUN2D2. UP. (extra)
; RA7MAPU7RVAT. Up. -> RA7MAPU7RVAT. UP. (extra)
; KAN2T2HAC2R. Up. -> KAN2T2HAC2R. UP. (extra)
; zu BR2H. A7R Up. -> zu BR2H. A7R UP. (extra) (in text of abbreviation A7NANDAG)
Additional changes/corrections to pwbib0, per issue cases above.
; 6 WEBER,GJOT. -> WEBER,G4JOT.
; 8 DEC2IN->DEC2I7N
; 11 .DIVJA7V<AD>. -> .DIVJA7VAD.
; 14 .LEUMANNA, Aup. Gl. -> .LEUMANN, Aup. Gl.
; 15 KA7VJA7 (OKALOK4ANA), Hdschr. (AUFRECHT) -> KA7VJA7L ...
; 16 .HILLEBR. . -> .HILLEBR. N.
; 19 .SAM5KSHPAC2 (AM5KARAG4AJA) von MA7DHAVA (AUFRECHT). -> SAM5KSHEPAC2
; 25 VIKR<OR>. -> VIKR.
; 30 KA7R->KA7RIKA7
; 36 OppCat->OPP.CAT.
This is actually written 'OPP.Cat.' in both bibliography and print, but as OPP.CAT in pw.xml. As a
short cut, I propose to change crefmatch to artificially capitalize this to force a match.
Above changes to pwbib0 installed (committed) in PWK
A crefmatch rerun now shows that 76% of pwbib0 abbreviations accounted for, and 83% of sortedcref instances accounted for. So, we're making some progress!
I've NOT yet dealt with these issues identified as pwbib1 problems:
; 34 .PRAKRIJA7K (AUMUDI), Hdschr. (AUFRECHT). RA7JENDR. Not. == pwbib1 problem
; 35 VASISHT2HA, pwbib1 problem. remove comma.
; 38 KUHN'SZ->KUHN'S.Z. pwbib1
; 17 SADDH.P.4->SADDH.P. pwbib1 change
; Noticed that 'G4' should be , in pwbib1.txt, converted to 'J' in pwbib1.txt
; 27 PRATIG4N4A7S(U7TRA) refers to PRATIG4N4A7S, pwbib1 problem
or with these two, identified as needing adjustments to abbrv.py:
; 5 VET.(U.) see error in abbrv.py
; 31 DAC2AK.(1925) abbrv.py problem
Here are the items currently identified as abbreviations appearing in the bibliography (pwbib) but having no examples in pw.xml:
21 MAHA7B
22 C2RIMA7LA7M
26 Bydragen
28 HARISV
29 gan2a
33 SVAPNAK4(INTA7MAN2I)
14 LEUMANNA,Aup.Gl
We could call this pwbib_unused.txt, and make use of this list in doing crefmatch.
Not lower case capital 'P', but small caps "P". Otherwise accepted.
A bit more progress:
['Mat.med','H.an','DAC2AK.(1925)','VET.(U.)',
'VIKR.dra7v','PISCHEL,deGr.pr','Bibl.ind','KAP.(BALL.)']
modifications to crefmatch:
With these changes, our to do list (for both pwbib unresolved and sortedcref unresolved) stands at about 15%.
I'll switch to some other task for a few days.
Not sure whether this particular issue of correction identification should be considered a Part 1, which can be closed. Will let @drdhaval2785 do the honors.
Hmm, I'm lost. Have you found many cases from real life abbreviations that are additional to the lists given in Preface? Do I understand it right? I have lost myself in the terminology and files names, forgive me my misery.
I agree. The issue is longer than what is manageable. So part 1 and close policy seems fine to me. @gasyoun Right now we are using comparision between pwbib (from Thomas) and sortedcrefs (from Dhaval) to weed out errors in both. That is why correcting each error gives some progress. Earlier 78% or so were matching. Now 85% are matching with these corrections. I guess after 90% matching, we need to do manual weeding out of obviously undeserving entries from sortedcrefs.txt.
Whatever remains in sortedcrefs.txt after these cleanups may be actual list of 'Additions' to bibliography which the author may have overlooked. Even if we don't get any such addition, cleaning is really what matters the most as of now.
https://github.com/sanskrit-lexicon/PWK/blob/master/pw_ls/pwbib/diffstudy/bibminuscref.xml
This is the file which is being studied here.
These are the entries which were there in pwbib1.txt but not found in cref.