Closed drdhaval2785 closed 8 years ago
Just to give a glimpse of the output, I am copy pasting 20 entries from cmbsub.txt here.
¯BURNELL.T@maDvaBAzya@maDvaBAzya@82746:¯BURNELL,T:t:
¯C2A7N5KH@aGAhan@aGAhan@849:¯C2A7K:t:
¯HEM@cItkfta@cItkfta@40339:¯H:t:
¯K4AMAPAKA@rahitatva@°rahitatva@93133:¯K4AMPAKA:t:
¯BA7G4AN@ajara@aja/ra@1271:¯RA7G4AN:t:
¯Vardh@paryAya@paryAya@64793:¯Va7rtt:t:
¯K4ARARA@udamehin@udamehin@18709:¯K4ARAKA:t:
¯PAN4K4AT.ed.Bomb@antarvAsika@antarvAsika@5371:¯PAN4K4AT.ed.orn:t:
¯R.GORR@aDiyoDa@aDiyoDa@2889:¯GOBH:t:
¯K4D@aravindinI@aravindinI@9028:¯KA7D:t:
¯C2a7n5kh@upasTa@upa/sTa@20180:¯C2A7K:t:
¯BA7DAR.S@anupraveSa@anupraveSa@4632:¯BA7DAR:t:
¯BHA7G.P.ed.Bomb@anudapAna@anudapAna@4432:¯BHA7G.P.ed.Bomb:n:
¯A7PST@aniha@aniha@4217:¯A7PAST:t:
¯A7PAST.GAUT@Atmavant@Atmavant@14243:¯A7PAST.C2R:t:
¯H4MA7DRI@udvaMSa@udvaMSa@19185:¯HEMA7DRI:t:
¯A7RSHBr@pUrvAtiTa@pUrvAtiTa@69228:¯A7RSH.BR:t:
¯GR2HJ@digvyAGAraRa@digvyAGAraRa@50109:¯GOBH:t:
¯MALLIN@aparicita@aparicita@6129:¯LALIT:t:
¯A7RJABH.S@atyazwi@atyazwi@2280:¯A7RJABH:t:
It's impossible for me to work in such a UI. I do not understand where to look at. Still a HTML would be desirable or am I the only one? Excuse me for complaining, it's mega work done, just can't help in such format. It's too user-unfriendly. I lack IAST, but that's my issue after all.
@gasyoun http://sanskrit-lexicon.github.io/PWK/cmbsub.html Not proper ? I work on this UI. Didn't face much issue. the txt files are for correction submission in standard format. Not for viewing.
Fuzzy suggestions are given now in the text file. So, safe to close this documentation issue..
@drdhaval2785 what if last, additional column of HTML would contain the TXT line? In that case I could copypaste it without looking for the same entry in TXT. Most fixes are easy. I could fix them in seconds. But the way it is it takes minutes or I just abandon submitting at all.
Dear Gasyoun, you need to look at pw.txt without fail. The reason is - I display only one entry which refers to the work (alphabetic first i guess). But there are many cases which are not enlisted. E.g. the submission of 'Calc' referred to at least three differrent works. If I had gone by the entry displayed in HTML, I would have wrongly altered the rest of the books. And showing all occurrences of a reference is not an option either. The file would be more than 10000 entries long. Don't want to duscourage people by size. Thats why I am hiding the other occurrences of the same reference.
The way i work is- keep HTML file open in firefox. Keep submission file and pw.txt open in notepad++ side by side. Copy paste into search box of notepad++ from submission file. 'Find all in current docunent' if there are more than one entries - I click on them in notepad++ and see their context. If I can't decide the entry from text file, I see HTML and finally submit.
Taking a clue from numfuzzy effort of @funderburkjim (and mostly adaptation of his code there + https://github.com/funderburkjim/fuzzyalpha-example), I have tried to give the suggestions for corrections in the submission files. https://github.com/sanskrit-lexicon/PWK/commit/d8d74e8dadcca277a327ffd627fca387a879c384 is the commit responsible.
Code amended is stdabbrv.sh and stdabbrv.py to accomodate the fuzzy logic.
The logic is
¯ls@key1@key2@lnum:¯suggestion:t:
e.g.¯BURNELL.T@maDvaBAzya@maDvaBAzya@82746:¯BURNELL,T:t:
¯ls@key1@key2@lnum:¯suggestion:n:
e.g.¯BHA7G.P.ed.Bomb@anudapAna@anudapAna@4432:¯BHA7G.P.ed.Bomb:n:
Thus now the submission is reasonably improved. If the suggestion is fine, leave it as it is.