Open drdhaval2785 opened 8 years ago
@zaaf2, @gasyoun and @funderburkjim are welcome to add the cases where they stumble upon issues of orthographic similarity and wrong data entry.
Another one which I came across was 'M' and a print smudge.
So it's a half 'R' : zaRmatanAwaka. Easy to see why someone using only the form of the Devanagari graphemes would confuse 'e' and 'half-R'.
3 'm' and 's' are confused when there is bad print. See
pw:suSlizwamaMDi:suSlizwasaMDi:t:sam has the tendency to generate 'M' all across dictionaries.
I also have a vague memory that earlier submissions also saw this feature. But very infrequent.
atikUra:AP -> atikrUra:AP
@funderburkjim might be even more from AP, AP90 ku
instead of kru
(ka
& kra
).
4.'sva' / 'rava' / 'Ka' are confusing. See mayUsva instead of mayUKa.
'sva' / 'rava'
for sure, students mix it when learning devanagari.
5.N/q
See
shs:KaNgAGAta,12637:KaqgAGAta:t:gAG
Taking a clue from thiscorrection submission
The subdirectory https://github.com/sanskrit-lexicon/CORRECTIONS/tree/master/dhaval/similarortho and https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/dhaval/similarortho/similarortho.py code explores possibility of finding errors based on similar orthographic symbols in Sanskrit. e.g. ध and घ look alike and therefore cause confusion in data entry operators' minds.
So https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/dhaval/similarortho/DGconf.txt file examines cases where, word has 'D' and word.replace('D','G') is in sanhw2.txt and word has 'G' and word.replace('G','D') is in sanhw2.txt
List of confusing orthography