Closed drdhaval2785 closed 8 years ago
Totally agree, space lacking.
Have added logic to abbrv.py to deal with this. It's not a typographical error situation, but rather a case of not recognizing where the abbreviation part of the literary source reference ends. The logic so far deals with cases where a known pwbib abbreviation is not matched due to the presence of roman numerals following the abbreviation.
PWK programs rerun.
No change to pwbib cases. Still 15 remain in bibminuscref.
Some progress in abbrvlist matching.
Previously (#41) 66112 out of 73116 cases (90.4%) Now, 66186 out of 72890 cases are matched (90.8%).
There are still some cases in the refised crefminusbib.txt which have a roman numeral as part of the abbreviation.(e.g., KAUD.II, PANDIT.IX, HIR.II, and some others). However, these are not matched because, for instance, the abbreviation without the roman numeral (like KAUD) is a typo for KA7D, or PANDIT is a reference not in pwbib. So, these will have to be caught in some other sweep of shady characters in crefminusbib.
Think this issue can be closed.
No change to pwbib cases. Still 15 remain in bibminuscref.
I consistently am observing that your bibminuscref figure shows something other than the actual. https://github.com/sanskrit-lexicon/PWK/blob/master/pw_ls/pwbib/bibminuscref.txt shows only 12 entries. My local environment also shows that file after syncing. I guess your counter is a bit off or some counter error in program.
It can be closed when you upload the commits. In addition to the matching figures, I would be interested in the decrement in the the crefminusbib.txt entries.
PANDIT is a reference not in pwbib
It's a journal. http://www.worldcat.org/title/pandit-a-monthly-journal-of-the-benares-college-devoted-to-sanskrit-literature/oclc/64240717
Re Count of bibminuscref.txt:
Looking at it today (Dec 31, 2015), I find 13 lines in file:
ANUKRAM.zuR2V
KA7TJ(A7JANA)
KA7TJ.SANA7NAS
DRAVJAC2
SADDH.P.4
K4ANDRA7LOKA
HIT.ed.JOHNS
ka7tj
SAM5KSHEPAC2
ALAM5KA7RAV
ALAM5KA7RAR
DEVATA7DHJ.BRA7HM
KAP.S
Perhaps '15' was due to my carelessness, or perhaps due to the state of the system when the comment was made.
Re PANDIT:
It occurs three times. Minor changes made. Not yet installed
; hw= antarmuKatA
¯Pandit.9,216 ‹a.›@¯PANDIT.9,216,a@p@ ,a is part of reference. upcase 'Pandit' for consistency
; hw=savfttika
¯PANDIT9,216, ‹a.›@¯PANDIT.9,216,a.@t@ a is part of refernce.
; hw = sAMKyakArikA no change required
; ¯PANDIT.IX@¯PANDIT.IX@t@
corrections now installed.
[.][IVXMC]+[.,][0-9]
when run on sortedcrefs.txt, gave 37 entries. Preliminary survey says thal all of them should have a space in between. This needs examination.