sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

Proper Names in SLP1 ({r}Ama for Rāma) #24

Open gasyoun opened 9 years ago

gasyoun commented 9 years ago

At https://github.com/drdhaval2785/SanskritSorting/issues/27 Jim said It would be possible to adapt transcoder files to work with {} for proper names. As I want to have the proper names in my Reverse dictionary I humbly ask to extract the proper names data from at least MW. Could you include in a plan, Jim, please?

funderburkjim commented 9 years ago

Is what you are looking for a list of MW headwords which are proper names?

gasyoun commented 9 years ago

Indeed and not only that. I'm thinking how to extract all of them from all dictionaries. I'm ready to do even additional markup, but first I would want to listen to your bright ideas.

funderburkjim commented 9 years ago

For MW, many will be caught by searching for

<ab>N.</ab>

So, I propose generating a list of such headwords.

What kind of output are you looking for?

funderburkjim commented 9 years ago

The N. search will match some 'L-number' records that are not PROPER names, such as

<H2B><h><hc3>110</hc3><key1>kAlaka</key1><hc1>2</hc1><key2>kAlaka</key2><hom>1</hom></h>
<body> <lex type="inh">n.</lex> <c>N._of_a_pot-herb</c> <ls>Bhpr.</ls> </body><tail>
<MW>033095</MW> <mat/> <pc>277,3</pc> <L>49284</L></tail></H2B>
gasyoun commented 9 years ago

https://github.com/sanskrit-lexicon/MWS/issues/12 continued. Sure there are many false positives as the pot-herb, but a lot of valuable data as well. Can I add the tags additionally to ? I'm ready to mark human/deva proper names. Oh, ok, I wonder if some names are left out of <ab>N.</ab> and if there is a way other than manual to check it. 823 matches - seems too small to be true. 47027 (excluding 823 marked) - seems to be closer to the truth.

<c>N._of_a_man</c>
<c>N._of_an_<as0>A1ditya</as0><as1><s>Aditya</s></as1></c>
<c>N._of_a_woman_;_</c>

How about adding the data to https://github.com/funderburkjim/MWlexnorm/ How about comparing the list with Mahabharata Index and Puranic Encyclopedia?

funderburkjim commented 9 years ago

Re '47027 ... seems closer to the truth' This sounds right. I did not realize that there were so many 'naked' 'N.' abbreviations (Here 'naked' means not clothed by an 'ab' tag.)

Since you can generate the list, what help are you looking to me to provide?

gasyoun commented 9 years ago

How should I spread the ab tags? How to widen the usage? Looking through my 30 000 additions manually does not sound to be a good idea.

funderburkjim commented 9 years ago

I need to see a sample of the file(s) you are using.
Not sure what 'spread the ab tags' means.
Not sure what 'widen the usage' means.

gasyoun commented 9 years ago

Now only 700 words have it. I propose 30 000 should have them. There are no sample files. I need to understand how I can contribute in this markup expansion project, if you agree. I'm looking for names of living creatures.

gasyoun commented 3 years ago

@Andhrabharati do you understand the issue?

Andhrabharati commented 3 years ago

I do; but I've no intention to work for SLP1 stuff!!

And guess, you should first need to consult Peter Scharf before playing around with SLP1 thus

funderburkjim commented 3 years ago

@gasyoun Here's something that might be relevant for getting some proper names. Example under L=6, headword 'a':

<ab>N.</ab> of <s1 slp1="vizRu">Viṣṇu</s1>

Similarly, under Thus, we see that 'aH' is a name of Viṣṇu. Thus 'a' can be a proper name, although of course 'a' has other uses.

The search4709 matches in 4707 lines for "<ab>N[.]</ab> of <s1 slp1=" By search for the headwords in which these matches occur, I think you would get a fairly large list of words that could be proper names.

A slight variant would give some more headwords that can be used as proper nouns: 4233 matches in 4229 lines for "<ab>N[.]</ab> of a <s1 slp1=" such as

<L>54<pc>1,2<k1>aMSu<k2>aMSu<e>1A
¦ <ab>N.</ab> of a <s1 slp1="fzi">Ṛṣi</s1>, <ls>RV. viii, 5, 26</ls><info lex="inh"/>

Another variation 672 matches for "<ab>N[.]</ab> of an <s1 slp1=" e.g.,

<L>19<pc>1,1<k1>aMSa<k2>a/MSa<e>1A
¦ <ab>N.</ab> of an <s1 slp1="Aditya">Āditya</s1>.

Is this approach in the direction you are going, or maybe you've already exhausted such an approach ?

funderburkjim commented 3 years ago

An entirely different approach might be to use the headwords in INM (Index to names in the Mahabharata).

Presumably nearly every headword there is a proper name.

Similarly, there are many proper names among the headwords of ACC.

And back to the previous comment, another search that would lead to many proper-name headwords is (in MW): 16202 matches in 16190 lines for "<ab>N[.]</ab> of a [a-zA-Z] such as

<L>59<pc>1,2<k1>aMSuDAna<k2>aMSu—DAna<e>3
<s>aMSu—DAna</s> ¦ <lex>n.</lex> <ab>N.</ab> of a village    <<< PLACE NAME

<L>268<pc>2,2<k1>akAsAra<k2>a-kAsAra<e>1
<s>a-kAsAra</s> ¦ <lex>m.</lex> <ab>N.</ab> of a teacher    <<< PERSON NAME

And still another variant with many matches: 11764 matches for "<ab>N[.]</ab> of <ab>wk" Headword is name of a work. Example:

<L>922<pc>5,1<k1>agnigranTa<k2>agni/—granTa<e>3
<s>agni/—granTa</s> ¦ <lex>m.</lex> <ab>N.</ab> of <ab>wk.</ab>

Seems like there are lots of searches that will get high density of matches to headwords that may appear as proper names of one kind or another.

Andhrabharati commented 6 months ago

As the issue is 'continued' in another repo, this issue could be closed now.