sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

BHS verbforms #260

Open funderburkjim opened 8 years ago

funderburkjim commented 8 years ago

A study was made of the headwords of the BHS dictionary to identify verbs and verbforms.

This was motivated by the whitelisting work being done as mentioned in #254.

It was noticed that many of the otherwise unidentified headwords occurring only in the BHS dictionary (and in no other dictionary as a headword) were verb forms (such as third person singular of some conjugation of the verb). In connection with the whitelisting, it is felt that the spelling correctness of these verb forms should take into account the fact that they are inflected forms.

Of course, there is independent interest in lists of verbs, unrelated to the whitelisting objective.

The program and results are in the dictionaries/BHS/verbs directory of this repository.

A brief description of the files in this directory.

gasyoun commented 8 years ago

Let me tell you I've made my own research 2 years ago. BHS similarly as EWA and KEWA quotes roots as -ti and -te forms. I'm no fan of it, but still.

06.01.2014. I need to cut of the endings of verbs in BHS. Verbs are quoted in 3rd forms in BHS, so they are easy to locate. To find them, I used http://www.sanskrit-lexicon.uni-koeln.de/scans/BHSScan/2013/web/webtc2/index.php suffix "ti", Maximum "all" - 1564, after that I repeated with suffix "te", Maximum "all" and copypasted the results to a .txt file I looked in book, http://yadi.sk/d/xmPTu3LLFoDZ6 and I find verb E kilikīl-, same verb I find at http://www.sanskrit-lexicon.uni-koeln.de/scans/BHSScan/2013/web/webtc2/index.php [L=4889] [p= 184,1] kilikīlate, makes a loud noise (of Māra's army). So Schwarz, the author of the 1978 printed edition of a Sanskrit reverse dictionary cut off "kilikīlate" to "kilikīl-". To do so, I need patterns. After that - manual approval.

http://yadi.sk/d/S2LkEfxDFZtqo "-te" http://yadi.sk/d/8mvZJEe3FZucb "-ti"

Most of Verbs in Schwarz's list marked as E (=BHS) are sopasarga roots. We do not change that. We do not cut upasargas off. We leave them as they are.

False positives (have to be cleaned out manually before rules apply): Dharmadhātvarcivairocanasaṃbhavamati, n. of a Bodhisattva Dharmadhātunayajñānagati, n. of a Buddha Akṣayamati, n. of a Bodhisattva Acalamati, n. of a son of Māra (favorable to the Bodhisattva) Ati, read Atri, n. of a Prajāpati

Cutting Rules: atiprathate -> ate anuparigṛhṇīte -> te svādīyati -> ati sameti -> ti vasubhūti -> ti paryavāpnoti -> noti

drdhaval2785 commented 8 years ago

Gana and the terminals BvAdi - ati / ate adAdi - ti / te juhotyAdi - ti/te (with duplication of verb) divAdi - yati / yate svAdi - noti / nute tudAdi - ati/ate ruDAdi - ti/te/Di/De (with 'na' added in between) tanAdi - oti/ute kryAdi - nAti/nIte/RAti/RIte curAdi - ayati/Ayati (with sometimes a->A,[iI]->e,[uU]->o conversion in verb)

These are the major chopping rules.

gasyoun commented 8 years ago

Thanks, @drdhaval2785 - I guess it will kill some of the false positive ones. If we check before apply ti / te rule. No?!

funderburkjim commented 8 years ago

@gasyoun False positives (have to be cleaned out manually before rules apply):

In the list I made, these false positives have been pretty thoroughly weeded out.

Is the objective of your 'chopping' to know, for example, that 'anucalati' in BHS would correspond to 'anucal' in MW (if MW had this prefixed form of root cal) ?

If so, this should be doable by a program that (a) removes the prefixes (eg removes 'anu' from anucalati) and then (b) looks up 'calati' in a table of conjugations (which we have, from various sources, me, Huet, probably Dhaval.) to discover that 'calati' is 3s of 'cal'.

Is this the kind of analysis you are interested in?

gasyoun commented 8 years ago

Is the objective of your 'chopping' to know, for example, that 'anucalati' in BHS would correspond to 'anucal' in MW (if MW had this prefixed form of root cal) ?

As well, right.

Yes, I'm interested in such analysis. For it maybe even cutting of of upasargas and upasarga combinations would not be needed. Because MW has all the upasargas in it as part of the word. What would be interesting is would be to generate the list of PWG verbs with upasargas - because PWG's nest style makes it impossible to know how many forms are there actually related to verbs.

Is there a list of your false positives?

funderburkjim commented 8 years ago

Is there a list of your false positives?

The program (verbs1.py) generating the list is fairly simple. The false positives occur in two ways:

The program also excludes any headword that does NOT end in 'ati' or 'ate'. If there are verbs in BHS that end in some other way (which I doubt), then these would be silently excluded.

gasyoun commented 8 years ago

If there are verbs in BHS that end in some other way (which I doubt), then these would be silently excluded.

None I guess. Thanks for the detailed as usual comment.