sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

Implement Fuzzy-Found Errors #1

Open gasyoun opened 9 years ago

gasyoun commented 9 years ago

A year ago I was experimenting together with https://github.com/Kreol2013 on https://github.com/sanskrit-lexicon/Cologne/issues/14 In words longer than 14 characters http://www.ablebits.com/excel-find-similar/ located suspicious words. The longer the word, the bigger the chance there is a mistake. Not all 9828 cases where bulletproof errors (16 are real and checked), because uṣṇā is not a mistake of ṣṇā or vice versa. Most are false positives, like 6331 atiprapīḍita abhiprapīḍita. The longest suspicious word in PWK was jaladharagarjitaghoṣasusvaranakṣatrarājasaṃkusumritābhijña. In MW presumably the same word was written as jaladharagarjitaghoṣasusvaranakṣatrarājasaṃkusumitābhijña. I opened PWK printed book and found that the error is in the book and saṃkusumitā is correct. Nobody has implemented the changes proposed by Boethlingk himself from the corrigenda, so I do not know if I'm the first one, still. But I can't just copy-paste IAST to check if the word is there. I have to copy the word, convert, copy again, paste and find out, that nothing can be found on the word jaladharagarjitaghoSasusvaranakSatrarAjasaMkusumritAbhijJa but it was there in 2013. If it's corrected - how should I know? But if it's still there, only search fails to help me? Because a similar case sarvalokabhayāstambhitatvavidhvaṃsanakara is findable, but should be sarvalokabhayāstambhitatvaviddhvaṃsanakara (as in MW) and is, again, a mistake found in the book. fuzzy-word-correction-mw-pwk_26_11_2013_b3

3935 śābdikavidvaktavipramodaka śābdikavidvatkavipramodaka PWK typoe in book 4611 sarvajñarāmeśvarabhaṭṭāraka sarvajñarāmeśvarabhaṭāraka PWK typoe in OCR 8037 candrasūryajihmīkaraṇaprabha candrasūryajihmīkaraprabha PWG typoe in book

funderburkjim commented 9 years ago

I made a change just to PW and just to the list display for pw, namely for http://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2014/web/webtc1/index.php

To see the effect, Change the preferences to Keyboard Input: Unicode/Roman, and click OK. Now, in the display you can PASTE ROMAN UNICODE in the citation field.

I only checked this on one word, but the Javascript change was small, so it may work ok.

Does this do what you want?

gasyoun commented 9 years ago

Hardly works for me. After I set up Unicode/Roman, when I paste "dhātu" the "ā" gets swallowed and I get only "dhtu", so does not works. So this does not work with IAST how I would want to.

funderburkjim commented 9 years ago

I don't get the error. After setting preferences, I copied your word dhātu from the above issue, and pasted into citation box. As you see from the image below, the data was there: image

Here's how my preferences look: image