Open funderburkjim opened 8 years ago
Let me tell you I've made my own research 2 years ago. BHS similarly as EWA and KEWA quotes roots as -ti and -te forms. I'm no fan of it, but still.
Identification pattern is that the headword ends in 'ate' or 'ati'
- pattern is to narrow1266 headwords
vs 1741 in Gasuns' list, patterns listed bellow06.01.2014. I need to cut of the endings of verbs in BHS. Verbs are quoted in 3rd forms in BHS, so they are easy to locate. To find them, I used http://www.sanskrit-lexicon.uni-koeln.de/scans/BHSScan/2013/web/webtc2/index.php suffix "ti", Maximum "all" - 1564, after that I repeated with suffix "te", Maximum "all" and copypasted the results to a .txt file I looked in book, http://yadi.sk/d/xmPTu3LLFoDZ6 and I find verb E kilikīl-, same verb I find at http://www.sanskrit-lexicon.uni-koeln.de/scans/BHSScan/2013/web/webtc2/index.php [L=4889] [p= 184,1] kilikīlate, makes a loud noise (of Māra's army). So Schwarz, the author of the 1978 printed edition of a Sanskrit reverse dictionary cut off "kilikīlate" to "kilikīl-". To do so, I need patterns. After that - manual approval.
http://yadi.sk/d/S2LkEfxDFZtqo "-te" http://yadi.sk/d/8mvZJEe3FZucb "-ti"
Most of Verbs in Schwarz's list marked as E (=BHS) are sopasarga roots. We do not change that. We do not cut upasargas off. We leave them as they are.
False positives (have to be cleaned out manually before rules apply): Dharmadhātvarcivairocanasaṃbhavamati, n. of a Bodhisattva Dharmadhātunayajñānagati, n. of a Buddha Akṣayamati, n. of a Bodhisattva Acalamati, n. of a son of Māra (favorable to the Bodhisattva) Ati, read Atri, n. of a Prajāpati
Cutting Rules: atiprathate -> ate anuparigṛhṇīte -> te svādīyati -> ati sameti -> ti vasubhūti -> ti paryavāpnoti -> noti
Gana and the terminals BvAdi - ati / ate adAdi - ti / te juhotyAdi - ti/te (with duplication of verb) divAdi - yati / yate svAdi - noti / nute tudAdi - ati/ate ruDAdi - ti/te/Di/De (with 'na' added in between) tanAdi - oti/ute kryAdi - nAti/nIte/RAti/RIte curAdi - ayati/Ayati (with sometimes a->A,[iI]->e,[uU]->o conversion in verb)
These are the major chopping rules.
Thanks, @drdhaval2785 - I guess it will kill some of the false positive ones. If we check before apply ti / te
rule. No?!
@gasyoun False positives (have to be cleaned out manually before rules apply):
In the list I made, these false positives have been pretty thoroughly weeded out.
Is the objective of your 'chopping' to know, for example, that 'anucalati' in BHS would correspond to 'anucal' in MW (if MW had this prefixed form of root cal) ?
If so, this should be doable by a program that (a) removes the prefixes (eg removes 'anu' from anucalati) and then (b) looks up 'calati' in a table of conjugations (which we have, from various sources, me, Huet, probably Dhaval.) to discover that 'calati' is 3s of 'cal'.
Is this the kind of analysis you are interested in?
Is the objective of your 'chopping' to know, for example, that 'anucalati' in BHS would correspond to 'anucal' in MW (if MW had this prefixed form of root cal) ?
As well, right.
Yes, I'm interested in such analysis. For it maybe even cutting of of upasargas and upasarga combinations would not be needed. Because MW has all the upasargas in it as part of the word. What would be interesting is would be to generate the list of PWG verbs with upasargas - because PWG's nest style makes it impossible to know how many forms are there actually related to verbs.
Is there a list of your false positives?
Is there a list of your false positives?
The program (verbs1.py) generating the list is fairly simple. The false positives occur in two ways:
n. of
(Name of X) occurring in the first line of the definition in bhs.txt.
As currently written, these are not listed. Someone could modify the program to print these.nonverbs=['ajitAvati', .....
in the program). These cases were excluded
by hand, after examining the definitions. There are about 60 of these.The program also excludes any headword that does NOT end in 'ati' or 'ate'. If there are verbs in BHS that end in some other way (which I doubt), then these would be silently excluded.
If there are verbs in BHS that end in some other way (which I doubt), then these would be silently excluded.
None I guess. Thanks for the detailed as usual comment.
A study was made of the headwords of the BHS dictionary to identify verbs and verbforms.
This was motivated by the whitelisting work being done as mentioned in #254.
It was noticed that many of the otherwise unidentified headwords occurring only in the BHS dictionary (and in no other dictionary as a headword) were verb forms (such as third person singular of some conjugation of the verb). In connection with the whitelisting, it is felt that the spelling correctness of these verb forms should take into account the fact that they are inflected forms.
Of course, there is independent interest in lists of verbs, unrelated to the whitelisting objective.
The program and results are in the dictionaries/BHS/verbs directory of this repository.
A brief description of the files in this directory.