indic-dict / stardict-sanskrit

Stardict dictionary files for the Sanskrit language.
https://sanskrit-coders.github.io/dictionaries/offline/
76 stars 16 forks source link

Normalize apte-hi dictionary headwords better, fix defects #144

Open vvasuki opened 2 years ago

vvasuki commented 2 years ago

Headword normalization

current headword normalization in apte-hi is adhoc (adding headwords from eliminating terminal म् and ः which does not follow dIrgha AkAra) and not that good. for example, we have a fake headword maruta as a result. This needs to be fixed.

to do this, someone needs to regenerate the dict using the below

Other defects

The dictionary has other defects -

the uoh data had major errors in the entries i checked. - @akprasad

You can compare it with UoH's data (which mainly has the citations filled up) and the print/scan (which has more data that need to be inserted appropriately at resp. places throughout). - @Andhrabharati