sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

Roman numerals in literary resources #36

Closed drdhaval2785 closed 8 years ago

drdhaval2785 commented 8 years ago

[.][IVXMC]+[.,][0-9] when run on sortedcrefs.txt, gave 37 entries. Preliminary survey says thal all of them should have a space in between. This needs examination.

    Line 27: A7PAST.C2R.I,22,5@varzya@varzya^@99070@1
    Line 52: A7RAJABH.VII,16@kAlakxpti@kAlakxpti@27096@1
    Line 59: A7RJABH.IX,9@kAlAnupUrva@kAlAnupUrva@27338@1
    Line 62: A7RJABH.VII,11@triskanDa@triskanDa@47948@1
    Line 63: A7RJABH.VII,14@nizkAraRa@nizkAraRa@60109@1
    Line 64: A7RJABH.VII,15@nizkfzyaviDAna@nizkfzyaviDAna@60144@1
    Line 65: A7RJABH.VII,21@trEvarRa@trEvarRa@48067@1
    Line 640: HARIV.LANGL.II,310@Kara@Ka/ra@33474@1
    Line 678: HIR.II,2@graB@graB@37314@1
    Line 823: KA7D.II,103,13@kalApavarman@kalApavarman@25484@1
    Line 824: KA7D.II,114,14@KalIkAra@KalIkAra@33622@1
    Line 825: KA7D.II,134,5@aBihati@aBihati@7756@1
    Line 826: KA7D.II,16@kapolapAli@kapolapAli@24355@1
    Line 827: KA7D.II,2@anuGawana@anuGawana@4356@1
    Line 828: KA7D.II,34,11@udDUlay@udDUlay@19088@1
    Line 829: KA7D.II,35@kuntavanamaya@kuntavanamaya@28636@1
    Line 830: KA7D.II,37,4@aparajalaDi@aparajalaDi@6048@1
    Line 831: KA7D.II,47,19@kAyamAna@kAyamAna@26742@1
    Line 832: KA7D.II,70,11@amBomuc@amBomuc@8636@1
    Line 833: KA7D.II,74@Alapana@Alapana@15786@1
    Line 834: KA7D.II,74,22@Ajarjarita@Ajarjarita@13867@1
    Line 835: KA7D.II,74,5@ABoga@ABoga/@15112@1
    Line 836: KA7D.II,78@kUrcaka@kUrcaka@29783@1
    Line 951: Ka7d.II,115,11@OdAsInya@OdAsInya@22869@1
    Line 1089: MIT.II,62,b,14@niKAtatuzANgArAdimant@niKAtatuzANgArAdimant@58294@1
    Line 1203: PAN4K4AT.I,458@AhAva@AhAva@16553@1
    Line 1204: PAN4K4AT.II,137@tAlika@tAlika@45562@1
    Line 1205: PAN4K4AT.II,145@kunAdIkA@kunAdIkA@28613@1
    Line 1206: PAN4K4AT.II,156@viDana@viDana@102876@1
    Line 1207: PAN4K4AT.III,176:@pratyAdarSa@pratyAdarSa@72063@1
    Line 1208: PAN4K4AT.III,96@svatas@svatas@132061@1
    Line 1210: PAN4K4AT.IV,78@ruD@ruD@94552@1
    Line 1213: PAN4K4AT.V,11@Sri@Sri@115014@1
    Line 1516: TS.III,261,10@aByAtati@aByAtati@7931@1
    Line 1725: ka7d.II,17@kamalamaya@kamalamaya@24427@1
    Line 1852: HIT.IV,51@sugupti@sugupti@126030@2
    Line 1967: SV.I,5,2,1@viSvatfpta@viSvatfpta@104812@2
gasyoun commented 8 years ago

Totally agree, space lacking.

funderburkjim commented 8 years ago

Have added logic to abbrv.py to deal with this. It's not a typographical error situation, but rather a case of not recognizing where the abbreviation part of the literary source reference ends. The logic so far deals with cases where a known pwbib abbreviation is not matched due to the presence of roman numerals following the abbreviation.

PWK programs rerun.

No change to pwbib cases. Still 15 remain in bibminuscref.

Some progress in abbrvlist matching.

Previously (#41) 66112 out of 73116 cases (90.4%) Now, 66186 out of 72890 cases are matched (90.8%).

There are still some cases in the refised crefminusbib.txt which have a roman numeral as part of the abbreviation.(e.g., KAUD.II, PANDIT.IX, HIR.II, and some others). However, these are not matched because, for instance, the abbreviation without the roman numeral (like KAUD) is a typo for KA7D, or PANDIT is a reference not in pwbib. So, these will have to be caught in some other sweep of shady characters in crefminusbib.

Think this issue can be closed.

drdhaval2785 commented 8 years ago

No change to pwbib cases. Still 15 remain in bibminuscref.

I consistently am observing that your bibminuscref figure shows something other than the actual. https://github.com/sanskrit-lexicon/PWK/blob/master/pw_ls/pwbib/bibminuscref.txt shows only 12 entries. My local environment also shows that file after syncing. I guess your counter is a bit off or some counter error in program.

drdhaval2785 commented 8 years ago

It can be closed when you upload the commits. In addition to the matching figures, I would be interested in the decrement in the the crefminusbib.txt entries.

gasyoun commented 8 years ago

PANDIT is a reference not in pwbib

It's a journal. http://www.worldcat.org/title/pandit-a-monthly-journal-of-the-benares-college-devoted-to-sanskrit-literature/oclc/64240717

funderburkjim commented 8 years ago

Re Count of bibminuscref.txt:

Looking at it today (Dec 31, 2015), I find 13 lines in file:

ANUKRAM.zuR2V
KA7TJ(A7JANA)
KA7TJ.SANA7NAS
DRAVJAC2
SADDH.P.4
K4ANDRA7LOKA
HIT.ed.JOHNS
ka7tj
SAM5KSHEPAC2
ALAM5KA7RAV
ALAM5KA7RAR
DEVATA7DHJ.BRA7HM
KAP.S

Perhaps '15' was due to my carelessness, or perhaps due to the state of the system when the comment was made.

funderburkjim commented 8 years ago

Re PANDIT:

funderburkjim commented 8 years ago

corrections now installed.