sanskrit-lexicon / LRV

Convert the data of L R Vaidya Sanskrit-English dictionary to CDSL format
0 stars 0 forks source link

Find headwords missing in sanhw1 (Cologne dictionary headwords) #11

Closed drdhaval2785 closed 2 years ago

drdhaval2785 commented 2 years ago

They may need closer look for potential errors.

drdhaval2785 commented 2 years ago

There are 1361 entries to be examined.

unique_headwords.txt

drdhaval2785 commented 2 years ago

https://github.com/sanskrit-lexicon/LRV/commit/4a434db7787bed1c8dcff5d806c3db38ae15f857 suggests that these are low hanging fruits. Should be corrected manually.

drdhaval2785 commented 2 years ago

Initial stats, before corrections

Direct Match:   41274
Varga Panchama Match:   4223
Bracket Removal Match:  739
No Match:   1361
drdhaval2785 commented 2 years ago

Completed manual examination of 1361 entries mentioned in above comment https://github.com/sanskrit-lexicon/LRV/issues/11#issuecomment-1266480075.

drdhaval2785 commented 2 years ago

Statistics after corrections

Direct Match:   41686
Varga Panchama Match:   4268
Bracket Removal Match:  750
No Match:   894
drdhaval2785 commented 2 years ago

This file containing these 894 entries is not likely to give much corrections. One can treat it as a whitelist. They are the unique headwords which LRV brings to the CDSL data. unique_headwords.txt

drdhaval2785 commented 2 years ago

Now whatever remains is oversight. Closing this issue. If anything remains to be done, will open a new issue.