Open drdhaval2785 opened 7 years ago
Darśapaurṇamāsapaddhati and Darśapaurṇamāsaprayoga
Not a good idea to track for longer than usual English words, I guess. After them comes an abbreviation and letters (including Roman). That's a patttern, I would say.
Filtering out the English ones by pyenchant library. So only non English are going to be highlighted.
Started generating log files. On dev server pywork/issue-acc-5/descFreq.txt gives the words which are non English and missed. They may be subject, catalogue or headword tags.
UPDATE descFreq.txt
This file gives detail about the missed subject / catalogue / headword tags.
A superficial reading says that the list if quite useful. e.g. Extr:635 Dīkṣita:567 Libr:309 Gov:304 Paṇḍita:289 Av:257
Extr may mean Extra / Extract - No idea Dīkṣita - A common surname Libr,Gov - Missed cases of Gov. Or. Libr. Madras due to line breaks Av - aTarvaveda related treatises.
Indeed, most are real Sanskrit words, good catch!
Extract
Makes more sense than Extra :)
There are many cases in acc6.txt where the word is Sanskrit, but not included in the dictionary headwords. e.g. See Darśapaurṇamāsapaddhati and Darśapaurṇamāsaprayoga in the following entry. This may hint to missed headwords. I am sure someone might be actually interested in finding author based on the work. It is better if we can scrape out such missed cases properly and tag them as missing headwords or something like that. Modality can be decided later. Currently I am interested in only identifying Lnum and the potential missed headword.