sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

Similar orthography #198

Open drdhaval2785 opened 8 years ago

drdhaval2785 commented 8 years ago

Taking a clue from thiscorrection submission

ap:araMDuza:araMDuza:t:Confusion of G and D in Sanskrit. 'Sounding loudly' has more relevance to 'Guz'.

The subdirectory https://github.com/sanskrit-lexicon/CORRECTIONS/tree/master/dhaval/similarortho and https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/dhaval/similarortho/similarortho.py code explores possibility of finding errors based on similar orthographic symbols in Sanskrit. e.g. ध and घ look alike and therefore cause confusion in data entry operators' minds.

So https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/dhaval/similarortho/DGconf.txt file examines cases where, word has 'D' and word.replace('D','G') is in sanhw2.txt and word has 'G' and word.replace('G','D') is in sanhw2.txt

List of confusing orthography

  1. 'D'-'G'
  2. 'R'-'e'
drdhaval2785 commented 8 years ago

@zaaf2, @gasyoun and @funderburkjim are welcome to add the cases where they stumble upon issues of orthographic similarity and wrong data entry.

Another one which I came across was 'M' and a print smudge.

drdhaval2785 commented 8 years ago

R and e are confusing some data entry operators

Proof is here: capture

Whereas the print is

capture

funderburkjim commented 8 years ago

So it's a half 'R' : zaRmatanAwaka. Easy to see why someone using only the form of the Devanagari graphemes would confuse 'e' and 'half-R'.

drdhaval2785 commented 8 years ago

3 'm' and 's' are confused when there is bad print. See

pw:suSlizwamaMDi:suSlizwasaMDi:t:sam has the tendency to generate 'M' all across dictionaries.

I also have a vague memory that earlier submissions also saw this feature. But very infrequent.

gasyoun commented 8 years ago

atikUra:AP -> atikrUra:AP

atikrura

@funderburkjim might be even more from AP, AP90 ku instead of kru (ka & kra).

drdhaval2785 commented 8 years ago

4.'sva' / 'rava' / 'Ka' are confusing. See mayUsva instead of mayUKa.

gasyoun commented 8 years ago

'sva' / 'rava' for sure, students mix it when learning devanagari.

drdhaval2785 commented 8 years ago

5.N/q

See

shs:KaNgAGAta,12637:KaqgAGAta:t:gAG

capture