sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.
13 stars 9 forks source link

Malformed abbreviations #161

Open drdhaval2785 opened 4 years ago

drdhaval2785 commented 4 years ago

160 suggested correction from Av to Adv.

Such cases may be caught programmatically for all dictionaries.

  1. Extract the abnreviations for all Lnum.
  2. Arrange them in descending order of occurrences.
  3. Start from the tail of the list and move upwards.
  4. Items which appear once or twice are mainly errors. Check and correct.

We had followed this method in some of the 'literary sources' corrections.

gasyoun commented 4 years ago

We had followed this method in some of the 'literary sources' corrections.

And it was good and productive.

drdhaval2785 commented 2 years ago

https://gist.github.com/drdhaval2785/f053928153a75d9cad0f78cafe5ea03b has the code.

drdhaval2785 commented 2 years ago

Dictionaries having <ls> markup

AP90, BOR, MW, PW, PWG

These are the only dictionaries which have literary sources marked up with this tag.

Cut off

Keeping the literary sources occurring only in less than or equal to 5 places. Those occurring only once have high probability of errors. Chances of error reduce once the source is cited more than once.

drdhaval2785 commented 2 years ago

Lists to be checked as on 01 September 2021

ap90.md bor.md mw.md pw.md

pwg may require further refining to reduce false positives

pwg.md

drdhaval2785 commented 2 years ago

AP90 does not have any errors on manual examination. BOR does not have any error on manual examination.