vakratu? - Githubissues

Shalu411 commented 10 years ago

Namaste Please see the image- vakratu Entry in MW- http://www.sanskrit-lexicon.uni-koeln.de/scans/MW72Scan/2014/web/webtc1/index.php "वक्रतु वक्रतु [L=40835] [p= 0876-b] m., N. of a deity." It should have been "वक्रतुण्ड" and given the whole meaning (explanation/ article whatever) for the word as per the snip. Do not know if it were वक्रता, then the explanation would be different- and I do not find it in the list at all. vakratu 2jpg Please see what issue is it. Thankyou

funderburkjim commented 10 years ago

In MW72 (the older version of Monier-Williams Sanskrit-English dictionary), the 'sub-headwords' are not currently recognized as headwords. vakra-tuRqa is a sub-headword, so it is not included among the headwords. You will find vakra-tuRqa in the entry for vakra.

The word 'vakratu' IS regarded as a headword (it is in the third column), and is the next headword after vakra.

Do you see the typographical difference between headwords and sub-headwords ?

In the case of the later MW (the one of 1899), we went through a process of identifying subheadwords and promoting them to headwords. A similar process could be done for MW72.

gasyoun commented 10 years ago

Should it be? I guess there are higher priorities. Like - comparing all Rigveda quotes with proofread text aside - sounds possible? Like comparing PWG and RV. - and we will find more than 10 mistakes just by comparing, and RV is not the most quoted as well, there are more.

funderburkjim commented 9 years ago

@Shalu411 @gasyoun I happen to be reviewing this again now, since it is still a PENDING case in the Sanskrit Correction Responses. I agree that promoting the subheadwords in MW72 does not have a high priority.

Thus, I will close this issue now.

gasyoun commented 9 years ago

"In MW72, the 'sub-headwords' are not currently recognized as headwords" is a very interesting detail. There are no other dictionaries with unintegrated subheadwords, right? It's a one week task or one month of work?

funderburkjim commented 9 years ago

re 'There are no other dictionaries with unintegrated subheadwords, right?'

No, just the opposite. ONLY for MW have we taken on the task of adding 'subheadwords'.

It's hard to estimate the amount of time it would take to identify the subheadwords for a particular dictionary. Looking in mw72.txt for the subheadwords of vakra (at line 201717 ff)

<>‘having crooked thorns,’ the jujube tree. {%--Vakra-
<>kan2t2aka, as,%} m. Acacia Catechu. {%--Vakra-khad2ga%}
<>or {%vakra-khad2gaka, as,%} m. a bent sword, a cimeter,
<>sabre. {%--Vakra-gati, is,%} f. crooked or winding course,

The pattern appears to be {%--X-Y[z]%} where 'X' is vakra (the 'parent'), Y is the rest of the samAsa, Z is optional 'stuff'. Making a formal recognition of the regexp(s) identifying subheadwords would be the first step.

gasyoun commented 9 years ago

"ONLY for MW have we taken on the task of adding 'subheadwords'." oh, so there is one more long journey ahead. I was thinking about if after I saw

AcAntodaka:MW,PW
AcAma:BEN,BUR,CAE,CCS,MD,MW,MW72,PW,PWG,SHS,VCP,WIL,YAT
AcAmaH:AP,AP90,SKD
AcAmaka:AP,AP90,MW,MW72,PW,PWG
AcAmanaka:MW,MW72,PW,PWG
AcAmanakaM:AP90
AcAmanakam:AP

I understood there could be potentially many more MW72 cases. And as you tell now, I wonder how many more, because PWK and PWG have a nested structure, totally different than MW. Any idea how the {%--X-Y[z]%} would look like a RegEx?

funderburkjim commented 9 years ago

re 'Any idea how the {%--X-Y[z]%} would look like a RegEx?'
A first guess would be

{%--([a-zA-Z0-9])*-([a-zA-Z0-9])*(.*?)%}   
                 X                   Y             z

And PWG/PWK would be an entirely different kind of pattern. Certainly for verbs, the structure of PWG is different, since the preverb forms are nested within the root (like gam ... -vi, -upa ...)

But before worrying about sub-headwords, there is the question of Alternate headwords. I've been working through Sampada's changes in VCP today, and there are myriad alternate headword spellings in VCP. The identification of Alternate headwords is probably a bit easier, at least for VCP, where the alternates live within the already identified 'key2'. But, alternates are probably not trivial (Didn't you look into this some with Shalu ?)

sanskrit-lexicon / MW72

vakratu? #2