sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

List of Proper Names (N.) #12

Closed gasyoun closed 6 months ago

gasyoun commented 10 years ago

https://github.com/sanskrit-lexicon/MWS/issues/5 continued. In the search of names. There are 47851 instances of "N." if counted in .xml file in Notepad++. The http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/2014/web/webtc2/index.php if searched by "N. of" shows endless variations of "m. N. of a man or tribe", but the total sum is never showed. Like vizvAmitra 201542, turvaza 86102 - names of historic heroes. That's what I actually aim for. And than there is agnitanu 944.1 - it's in the (first, <_ab>) ~880 list, but it's only "of partic. texts", so I would manually want to add some markup to differentiate people (gods, daemons etc.) from non-people and in this case exclude agnitanu 944.1 as non-people. Right now part of N. is in <_ab>, but much more is inside , that means almost no code at all. So anyway - tags for N. have to go through a change, as they are they are non-uniform and with little value. Otherwise the 880 list does not contain even the most popular names like Agni. It means that the <_ab> approach was wrong or the tag is not used wide enough. Tertium non datur.

9-24-2014 22-12-51 n

funderburkjim commented 10 years ago

So, why don't you modify the extract_N.php by changing the regex.

from <ab>N.</ab>
to, for instance,
\bN[.]\b
You might have to write this as
\\bN[.]\\b
since backslash has special meaning in PHP strings.
gasyoun commented 10 years ago

\bN[.]\b works just fine, thanks. 40828 entries. Compared to ~3000 heroes from Mahabharata index that's quite much. Next step would be to add subspecies of N. Do you agree that it might make sense? Let me see if there is some help from the patterns that occur.

funderburkjim commented 10 years ago

Yes. Sounds like the right approach.
You'll have to decide if some of these 40828 are irrelevant.

Your mention of 'add extra markup' sounds a good approach.

You could do this within the full mw.xml records, or, you could add classifications to the output list of headword/Lnumbers.

You might discern textual patterns for exclusion, that would automatically reduce your list of interest.

Use programs as much as possible to whittle your list down.

gasyoun commented 10 years ago

One by one approach will lead nowhere. I need regexes. I'll see what I can think of. Do you agree that there is such a need and it might get incorporated in the .xml edition? Walking today I thought that there might be some 200-400 more markup issues in MW to be found in the next 5-10 years.

funderburkjim commented 10 years ago

Any time I start thinking about a specific question, like you are now doing, that might be solved from MW, I usually think of some markup that would solve my problem. Most of the time, I decide to add this markup to some external file derived from MW.

Another issue is that often the existing markup of MW gets in the way of the question I'm interested in, so I create an extract (of part or all of MW) that has simplified markup that is more amenable to further analysis. If you think about it, that's sort of what's going on in the 'mapnorm' analysis.

So, yes, as long as MW exists, there will be markup issue questions. There probably is no perfect markup for the dictionary as a whole. There is very likely a better, though still imperfect, markup.

gasyoun commented 10 years ago

One more name hunting example. 378 entries in https://github.com/sanskrit-lexicon/SCH if I look for "name". That brings me:

funderburkjim commented 10 years ago

I don't understand the question. Please explain it more fully.

"Or should we be more patient ..." Another possibility is that you learn how to implement an enhancement, and look to me for technical advice on how to do so. As a starter, try to formulate exactly what markup you are wanting to add (and to which dictionary). Then, think about how this markup could be added: for instance, maybe by a regular expression replacement.

funderburkjim commented 10 years ago

At least some proper names in PW are identified by string 'N.pr.'. So a list of headwords of PW that contain this string might be of interest.

Andhrabharati commented 6 months ago

@funderburkjim / @gasyoun,

Is this issue closable now?

funderburkjim commented 6 months ago

As with several of these earlier (2014) issues of @gasyoun , there are some good ideas (e.g. cross-referencing proper names in dictionaries). But it is still unclear how to proceed with such analytical goals.

Since the issue still remains visible even if closed,, I say we close the issue now, since there is no clear path to resolution of the issue.