sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

Crefminus bib, the rest of them #57

Closed funderburkjim closed 8 years ago

funderburkjim commented 8 years ago

There are currently 927 cases mentioned in crefminusbib (instances of literary source references whose abbreviations have not been matched with the bibliography).

Essentially, each of these is a 'special case'.

I've developed a good format for working with these, and have begun the correction identification process. The format is that of an Emacs Org mode file, which is quite convenient. Here's the way the first case looks:

* DONE CASE 001=WHITNEY.,Ind
;------------------------------------------------------------------------
; WHITNEY.,Ind 1 1
;Instance 1, hw=vizwIma, L=105426, [[http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/servepdf.php?dict=PW&page=6136-1][page 6136-1]]
¯WHITNEY.,Ind.@vizwIma@@105426:¯WHITNEY,Ind.:t:
;<H1>000{vizwIma}1{vizwIma/}¦ ¯AV.20,135,5 ‹nach› ***¯WHITNEY.,Ind.*** PW105424

When viewing this file in Emacs Org mode, the link just shows up as a clickable link page 6136-1 which opens a browser tab.

The standard-form correction line can be edited as needed, and the pw.txt context is shown.

Most of the corrections are straightforward, a few require correction of context, and several new resources are identified, as well as adjustments needed to the clean_special abbrv.py code.

The reason for mentioning this as an issue is so that no one will duplicate effort on this task.

If anyone does want to take on some of these separately, I can provide a portion of the file.

Otherwise, I'll be tackling these over the next few weeks.

drdhaval2785 commented 8 years ago

Best luck Jim. I would be out of touch till 5th february. I guess this work would be over by then. If something is left over, I would be happ to help.

gasyoun commented 8 years ago

@funderburkjim no, go ahead. I can leave you my part as well. :ear_of_rice:

C:\xampp\htdocs\PWK\pw_ls\pwbib opened and here is what I see.

tackling these over the next few weeks.

@zaaf2 can we hope on you with PWK or PWG corrections still or are you out of the game?

I still see here many rude and simple mistakes, ones that should have been killed after fuzzy match.

WEBLR.LIT

WEBER.LIT

BüHLHR BU7HLER, BüNLER HüHLER BüHE.REP RüHLER.REP BHüHLER.REP BÜHLER.'s BU10ELER, BU7HLER BüLHLER BHU10LER, BHüHLER NüHLER BU10HLER.Rep.S BüLLER.REP BüRLER, BüBLER.REP BüHLR, BU1HLER, BU6HLER,

Variations of BÜHLER.

BUNELLT BRNELL, BUNRELLT

Variations of BUNRELL, .T.

STENZLER.DIE TAITT.A7R.DIE KATHA7S.BENANNT C2AM5K.EBEND NI7LAK.SOLL ROTH.VERMUTHET GAUT.ANGEBLICH GAL.RICHTIG R2V.Nachmals MBH.VON AV.LEHRERS AV.RICHTIG AV.PAIPP.ST VS.LIEST A7RSH.BR.Auch SA7J.ALS NI7LAK.FASST VAITA7N.VOLLSTANDIG DAMAJANTI7K.DERSELBE MBH.FüR AV.OFT A7PAST.AUSNAHMSWEISE C2AM5K.FüR SA7J.SELBST SA7J.SCHEINT VAITA7N.TITEL R2V.NUR NI7LAK.SEHR AUFRECHT.RICHTIG JOLIY.SCHULD K4AC2.ALS NI7LAK.NIMMT GAUT.UNEIG MBH.erklärt DHA7TUP.ALS SUC2R.erklärt BURNELL,T.Wohl Roth.Zur MED.IST KA7C2.JAPA.OHNE FRITZE.VERMUTHET PAN4K4AD.RICHTIG BURNELL,T.Richtig

Split off last word - German word.

A7RSH.BR.VGL GAUT.VGL PAN4K4AD.Vgl VAITA7N.VGL

Vgl. = vergleichen = compare = cf. Split words. No usual language word, "markup" word.

AUGE Blutegel ABSCHIEDSKUSS Ort NäHREND

THRANEN

(submission of the right spelling of this word has been submitted earlier in another https://github.com/sanskrit-lexicon/CORRECTIONS/issues/210, not yet incorporated batch error submission list).

{ARABISCH}

https://github.com/sanskrit-lexicon/ArabicInSanskrit/issues/7#issuecomment-167155495 corrected here.

German words. No abbreviation, no source.

DELBRU1CK DELBRüCK

https://de.wikipedia.org/wiki/Berthold_Delbr%C3%BCck

R2V.DE

R2V.DE -> R2V. der > The `The Va7tra7s ;` tagged wrongly. > A.V.Paric -> A.V.Pariç > KLAGE Nach KLAGE (Kuhn's Z. 311).

KLAGE indeed the author's surname. I do not know how to solve similar issues, but without Kuhn's Z.</noti> <ls>311 it will be impossible to locate the work.

LALIT.Partic Totally gone wrong. 2nd part must go. Numbers from scan lost.

Lot.delab.l.91.109.134.195 -> Lot.de la b.l. 91.109.134.195 LALIT.Partic. -> LALIT. 214,2 Partic. is from Partic[ular] = for example, @zaaf2 ? Partic. = 831 instances. ![vollendung](https://cloud.githubusercontent.com/assets/80761/12366510/af9aad96-bbec-11e5-9e49-6188d8a116a3.PNG) > BHAT2T2.PARTIC Same as `LALIT.Partic` Fishy. > AUPRECHT AUFRECHT > SCHIEFNR SCHIEFER > MAXMüller's.Ausg MAX Müller edition. > Sitzungsberichte > Monatsberichte Not enough, must have additional words around. It's some annual / monthly / scientific report. > WASSILIEW, > WASSILIEW There still remain such simple cases where the only difference is a comma. It's a new author, a Russian author. > KOSEG KOSEG[ARTEN] > KALPAS.S How about a new tag inside bibliography? `BÜHLER` and `BÜHLER's` is the same source. Can `'s` tag to be cut of for author counting purposes?
zaaf2 commented 8 years ago

@gasyoun I am afraid you will have to count me out. It has been very hard to find time lately. Only eventually perhaps I will be able to help, not systematically.

gasyoun commented 8 years ago

@funderburkjim after implementation of batch of my corrections I can take another look to ease your pain. 927 is a big number and I deal only with the easy ones, German mostly.

funderburkjim commented 8 years ago

@gasyoun I appreciate the offer, but actually think it is better for me to do them, since I'm so familiar with the forms. I'm at 26% currently.

If you have time to work on something, I think #56 would be a good choice, especially since @zaaf2 has limited time now, and discovering titles for the 'new' abbreviations is a task for which your knowledge of the literature would be an advantage.

drdhaval2785 commented 8 years ago

This seems reasonable sharing of responsibility. And 26% is rather quick Jim. Bes luck.

gasyoun commented 8 years ago

@funderburkjim that's (56th) a task too big, I will can can not accept it now. First I need clean lists of headwords. Everything else - after. Otherwise I'll not make first volume till March with printed list of clean headwords. Hope still it can be finished one day.

drdhaval2785 commented 8 years ago

@funderburkjim Interim update would be welcome. I am anxious to know the percentage of work done by you from 927 list.

funderburkjim commented 8 years ago

I've gone through all of them, and installed and updated everything.

Somehow, the new crefminusbib still has 71 unresolved.

Will work to resolve these tomorrow.

funderburkjim commented 8 years ago

Having trouble syncing. CHanges not yet committed.

drdhaval2785 commented 8 years ago

I havent committed anything in the interim. Not sure why there is problem in syncing.

funderburkjim commented 8 years ago

I think you did a pwbib0 correction commit since my last sync.

Anyway, I recloned, installed my changes (to pwbib_new, abbrv.py, crefmatch.py); then reran based on new pw. Those are now committed to GitHub.

There are 75 unresolved in crefminusbib.

As mentioned, will work on resolving them tomorrow.

gasyoun commented 8 years ago

FÜHRER.BR2H JOLIY.SCHULD NI7LAK.FASST AV.OFT JOLLY,SCHULD

German words unsplit.

SCHL Schlegel.

drdhaval2785 commented 8 years ago

@funderburkjim I congratulate you for completing this mammoth task single-handedly. Once you cover the remaining 75, it would be a full fledged Ph.D. thesis coming to an end. Good computational work on PW bibliography.

Best regards