Open funderburkjim opened 4 years ago
The report is organized according to the CCS entries identified as verbs; each such entry is considered a 'case':
;; Case 0001: L=191, k1=aNkay, k2=aNkay, code=V, #upasargas=0, mw=aNk (diff)
This record provides
mw=?
) where no correspondence currently identified.When there are upasargas for a CCS entry, these are grouped below the case. Consider the verb 'tar' (über etwas setzen) (to cross over):
;; Case 0306: L=8281, k1=tar, k2=tar, code=V, #upasargas=11 (10/1), mw=tF (diff)
01 ava tar avatar avatF yes ava+tF
02 A tar Atar AtF yes A+tF
03 ud tar uttar uttF yes ud+tF
04 prod tar prottar prottF yes pra+ud+tF
05 samud tar samuttar samuttF yes sam+ud+tF
06 ni tar nitar nitF yes ni+tF
07 nis tar nistar nistF yes nis+tF
08 pra tar pratar pratF yes pra+tF
09 vipra tar vipratar vipratF no
10 vi tar vitar vitF yes vi+tF
11 sam tar saMtar saMtF yes sam+tF
Note that 'tar' in CCS is said to correspond to 'tF' in MW.
There are 11 upasargas found; 10 have been matched to MW prefixed verbs and one (vipra
)
has not been matched (that is, CCS has 'vipra' as upasarga for 'tF', but MW does not have
a prefixed verb for 'tF' with prefix vipra; i.e., vipratF is not a prefixed verb in MW.
The listing for upasargas shows:
Currently, 1986 of the upasargas are identified with MW prefixed verb entries (search ' yes') and 129 are not identified with MW prefixed verb entries (search ' no').
In contrast to CAE, where previous verb identification markup was present,
in CCS verbs must be identified by some other patterns.
The basic pattern used is that, within the Devanagari text of an entry, there should
appear a present tense 3rd person singular verb ending 'ti' or 'te'.
The regex used is u'¦.*t[ie][,) ]*#}'
.
The first line of the text should also NOT include a pattern indicating
a noun or adverb. Also, several false-positive entries are excluded,
in the ccs_verb_exclude.txt file.
It is possible that there are some verb entries in CCS that have been missed by the above pattern matching. However, a percentage comparison between CCS and CAE suggests that there are not many, if any, CCS verbs that have been missed by the pattern-matching method.
But still it would be good to do an exclusion analysis for CCS (and CAE also, for that matter) to be more directly address the completeness of the verb identification.
There is no clear identification of upasargas within verb entries of CCS. Rather, upasargas only appear as Devanagari text. But there is also much other Devanagari text (such as different verb forms, participles, etc.) In this scan snippet (from CCS verb 'tar'), we see several Devanagari text instances, some being upasargas (or compound upasargas) and some being related non-upasarga Sanskrit words.
So the approach taken to identify upasargas within verb entries makes use of the list of upasargas that appear within the CAE dictionary. This list, in cae_upasargas.txt , contains 142 upasargas (the base upasargas along with various compound upasargas) that were previously identified as occurring within the verb entries of Cappeller's Sanskrit-English dictionary. In addition, this file contains 8 additional compound upasargas that were noticed to occur within one or another CCS entry.
Then, for a given verb entry of CCS , all the Devanagari words of the entry were examined, and those words appearing in the list of compound upasargas were considered to be the upasargas for that verb entry of CCS.
Further, this computed list of upasargas for each entry was manually compared with the underlying text of the CCS entry to confirm the list. The resulting list appears in the ccs_preverb0 file; this file is the basis of the upasargas of the ccs_preverb1 report.
There are 11 upasargas found; 10 have been matched to MW prefixed verbs and one (vipra) has not been matched (that is, CCS has 'vipra' as upasarga for 'tF', but MW does not have a prefixed verb for 'tF' with prefix vipra; i.e., vipratF is not a prefixed verb in MW.
Perfect explanation.
But still it would be good to do an exclusion analysis for CCS (and CAE also, for that matter) to be more directly address the completeness of the verb identification.
Let it be. We will get there one day.
contains 142 upasargas (the base upasargas along with various compound upasargas)
142+8, interesting. In 2015 @drdhaval2785 wrote I have a readymade list, made out of the upasargArthasiddhAntacandrikA: $upasarga_combinations = array("ati,atinis,atipra,ativi,ativyA,atisam,atyati,atyaBi,atyA,atyud,atyupa,aDi,aDini,aDinis,aDivi,aDyava,aDyA,aDyupa,anu,anuni,anunis,anuparA,anupari,anuparyA,anupra,anuprati,anuvi,anuvyava,anuvyA,anusam,anusampra,anUd,anvapa,anvava,anvA,apa,apani,apanis,apaparA,apaparyA,apapra,apavyA,apA,apAti,api,apipari,apod,apyati,aBi,aBini,aBinis,aBiparA,aBipari,aBiparyA,aBipra,aBivi,aBivyA,aBisamA,aBisam,aByati,aByaDi,aByanu,aByapa,aByava,aByA,aByudA,aByud,aByupa,aByupA,aByupAva,ava,avani,avA,A,utpra,udava,udA,ud,udvi,unni,upa,upani,upanis,upanyA,upapari,upaparyA,upapra,upavi,upavyA,upasaṁni,upasamA,upasam,upA,upAti,upAva,upodA,upod,upopa,duHsam,duranu,durava,durA,durud,durupa,durni,duzpari,duzpra,dus,ni,nipra,nirati,niraDi,niranu,nirapa,niraBi,niraBi,nirava,nirupA,nirvi,nivyA,nizpra,nisu,nis,nyA,parA,pari,parini,parinis,paripra,parivi,parivyA,parisam,paryaDi,paryanu,paryava,paryA,paryud,paryupa,pra,praNi,prati,pratini,pratinis,pratiparA,pratipari,pratipra,prativi,prativyA,pratisam,pratyaDi,pratyanu,pratyapa,pratyapi,pratyaBi,pratyava,pratyA,pratyudA,pratyud,pratyupa,pratyupA,pravi,pravyA,prasam,prA,prADi,prod,vi,vini,vinis,viparA,vipari,viparyA,vipra,viprati,visam,vyati,vyanu,vyanvA,vyapa,vyapA,vyaBi,vyava,vyA,vyud,vyupa,saṁvi,saṁvyava,saṁvyA,sanni,samati,samaDi,samanu,samanuvi,samanvA,samapa,samapi,samaBi,samaBivyA,samaBisam,samaBisampra,samaByava,samaByA,samaByud,samava,samava,samavA,samA,samudA,samud,samupa,samupA,sam,samparA,sampari,sampra,samprati,samprA,samprod,sampvari,su,supari,suvi,susamA,svanu,svaBi,svaByA,");
The verbs01 directory aims
The comments here will focus on the ccs_preverb1 report.
ccs_preverb1_deva is a Devanagari version of the report.
Currently, 1009 of the 29986 entries of CCS are identifed as verbs. 484 of these verbs have upasargas, and a total of 2115 upasargas are identified.
All but 9 of the verbs are found to correspond with MW verbs. All but 129 of the upasargas are found to correspond with MW prefixed verbs.