sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

fuzzy suggestions for correction submission #44

Closed drdhaval2785 closed 8 years ago

drdhaval2785 commented 8 years ago

Taking a clue from numfuzzy effort of @funderburkjim (and mostly adaptation of his code there +, I have tried to give the suggestions for corrections in the submission files. is the commit responsible.

Code amended is and to accomodate the fuzzy logic.

The logic is

  1. For any cref entry, if there is a fuzzy match in pwbib1.txt, it is shown as ¯ls@key1@key2@lnum:¯suggestion:t: e.g. ¯BURNELL.T@maDvaBAzya@maDvaBAzya@82746:¯BURNELL,T:t:
  2. If there is not fuzzy match, it is shown as ¯ls@key1@key2@lnum:¯suggestion:n: e.g. ¯BHA7G.P.ed.Bomb@anudapAna@anudapAna@4432:¯BHA7G.P.ed.Bomb:n:

Thus now the submission is reasonably improved. If the suggestion is fine, leave it as it is.

drdhaval2785 commented 8 years ago

Just to give a glimpse of the output, I am copy pasting 20 entries from cmbsub.txt here.

gasyoun commented 8 years ago

It's impossible for me to work in such a UI. I do not understand where to look at. Still a HTML would be desirable or am I the only one? Excuse me for complaining, it's mega work done, just can't help in such format. It's too user-unfriendly. I lack IAST, but that's my issue after all.

drdhaval2785 commented 8 years ago

@gasyoun Not proper ? I work on this UI. Didn't face much issue. the txt files are for correction submission in standard format. Not for viewing.

drdhaval2785 commented 8 years ago

Fuzzy suggestions are given now in the text file. So, safe to close this documentation issue..

gasyoun commented 8 years ago

@drdhaval2785 what if last, additional column of HTML would contain the TXT line? In that case I could copypaste it without looking for the same entry in TXT. Most fixes are easy. I could fix them in seconds. But the way it is it takes minutes or I just abandon submitting at all.

drdhaval2785 commented 8 years ago

Dear Gasyoun, you need to look at pw.txt without fail. The reason is - I display only one entry which refers to the work (alphabetic first i guess). But there are many cases which are not enlisted. E.g. the submission of 'Calc' referred to at least three differrent works. If I had gone by the entry displayed in HTML, I would have wrongly altered the rest of the books. And showing all occurrences of a reference is not an option either. The file would be more than 10000 entries long. Don't want to duscourage people by size. Thats why I am hiding the other occurrences of the same reference.

The way i work is- keep HTML file open in firefox. Keep submission file and pw.txt open in notepad++ side by side. Copy paste into search box of notepad++ from submission file. 'Find all in current docunent' if there are more than one entries - I click on them in notepad++ and see their context. If I can't decide the entry from text file, I see HTML and finally submit.