sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

the missing (head)word endings before the gender indication, in the data #85

Closed Andhrabharati closed 3 years ago

Andhrabharati commented 3 years ago

I've been wondering since long (almost for over 10 years) about one issue noticed with MW99 data, and did not know whom to ask about it.

The point in my mind is about the missing (head)word endings before the gender indication in the data, which is present in the MW book. Everyone knows that he took every care to see that all essential information is included in his dictionary.

These word endings have an important role in Skt. Grammar, and are very essential in forming composite words, in sandhi process or for joining vibhakti-pratyayas to the word itself. Every authentic Skt. dictionary invariably contains this piece of information.

The image https://www.sanskrit-lexicon.uni-koeln.de/talkMay2008/mon-add2.gif provided at https://www.sanskrit-lexicon.uni-koeln.de/talkMay2008/markingMonier.html of the very initial attempt of MW digitisation (print page vs. typed matter) shows that from day-1 this issue has been there in the data [highlighted in red now]. And though some inside entries the entry words seem to contain this word-ending info [highlighted in green now], it is not being displayed now. I did not look at the internal data for this (so do not know if this is retained there). HW endings in MW99 data

Wonder why this has been not added all these days (it's almost 20 years now that the work started), though various attempts are undergoing about including other information, like adding Greek words etc. to the data.


BTW, attaching hereunder my discussion with Dhaval on this matter-

On Tue, Dec 1, 2020 at 12:47 PM Dhaval Patel drdhaval2785@gmail.com wrote: Maybe. Different people may have different mindset. Or the same person changed his mindset with time. Difficult to guess. 27 years is a long time.

On Tue, 1 Dec 2020, 12:29 Nagabhushana Rao K, knbrao@gmail.com wrote: But MW72 text has those! Probably a separate team with a different mindset worked for the later additions.

On Tue, 1 Dec 2020, 12:18 Dhaval Patel, drdhaval2785@gmail.com wrote: I have no idea about it. I think the lexicographers found it superfluous to write as, A, am at the end and also write m, f, n for gender.

On Tue, 1 Dec 2020, 12:12 Nagabhushana Rao K, knbrao@gmail.com wrote: No, I am not talking about the composite words. I meant the masculine and neuter gender indicators ([as] [am] etc) at the end of the words.

On Tue, Dec 1, 2020 at 11:54 AM Dhaval Patel drdhaval2785@gmail.com wrote: Namaste Rao ji, Let me see if I understand your requirement correctly. Earlier, if I pressed search for 'kamala', I would get 'कमलकीट', 'कमलखण्ड' etc. Nowadays, I get only कमल. I would love to have those words starting with kamala too in the dictionary entry. Is that what you want to say?

On Tue, 1 Dec 2020 at 11:43, Nagabhushana Rao K knbrao@gmail.com wrote: ... ... ... BTW, just seen that you're now in the maintenance team of Koeln Dictionary site. I was wondering why the MW99 left out the word endings for the HWs, after putting that much effort for so many years. However they are retained in the later addition of MW72. Can you find out the background for this? Also seen some corrections still in MW99, and I was pondering if I should do them!!


Incidentally noticed that Dhaval has raised an issue (MW compounds below parent headword #315 at sanskrit-lexicon/COLOGNE) at Github based on my mail above, with his initial mis-understanding, which has resulted into some fruitful discussion about listing the compound words under the parent HW. BTW, at our htttp://andhrabharati.com/dictionary/sanskrit, we are giving this listing of compound words from the beginning.

Finally, I strongly suggest adding this piece of "left" information in this work, to make it a "complete" counterpart to the original book.

Andhrabharati commented 3 years ago

And also for some entries, two endings are given in the book- which denote the different variants [forms] possible for the word.

Andhrabharati commented 3 years ago

Was going through mwmeta2.txt file given for downloads in the evening, and found a supporting statement in it, in favour of my argument on the topic.

C indicates an inflected form of the main entry; for instance  'agram'
has a code of 1C; it is an accusative used as indeclineable.

The material provided in parenthesis in the book seems to have been used at discretion, during the Koeln digitisation, but not in toto.

image

image

image

As I understood going through the book, these paranthesised texts are to be appropriately applied to the Entry words (as mentioned in the meta file, to inflect the entry word and tag with this C), or to the material under it as the case may be.

And the word-ending text without parenthesis, in my view, denotes an alternative form of the entry word.

It is really sad that plenty of such indicative information has not been considered, or deliberately left out.

I am sorry for using strong words, but that's how I feel & I do not hide my feelings.

@------------------ Can someone tell wherefrom the letter 'a' in red color appeared, after the entry word 'agre' the last indeclinable under this HW?

If it is a type/proofing error, that supports my saying that full reading (proofing) once is needed for the whole text [again Jim would be saying that this matter should be talked about at a different thread/post!!].

Andhrabharati commented 3 years ago

@------------------ Can someone tell wherefrom the letter 'a' in red color appeared, after the entry word 'agre' the last indeclinable under this HW?

Downloaded and looked at the mw.xml file.

This seems to have been used for identifying the cross-linked entries. [This is not a typo, but is deliberately introduced.] The intent being to say- "look at agre 'a' under agra", at the entry agre 'b'.

I am not sure if this kind of 'a' 'b' tagging for this purpose is used anywhere else.

AFAIK, just saying- "look at agre (having the meaning[s]) under agra", at the entry agre (without meaning[s] which have already been given somewhere supra), is the practice adopted, without using any \<hom> kind of extra tagging at either place.

Andhrabharati commented 3 years ago

Any normal reader is used to see such untagged entries in print books (or anywhere else) and understand properly, without any prior knowledge/training.

This Koeln representation demands such understanding from the user first, which he is to inculcate just for this Koeln data.

drdhaval2785 commented 3 years ago
<L>51484<pc>288,1<k1>kuYjara<k2>kuYjara<e>1
<s>kuYjara</s> ¦ <lex>m.</lex> (<ab>ifc.</ab> <lex type="hwifc">f(<s>A</s>). </lex>, <ls>MBh.</ls>; <ls>R.</ls>) an elephant, <ls>Mn. iii, 274</ls>; <ls>MBh.</ls> &c.<info lex="m:f#A"/>
<LEND>

This is the current underlying digitization of the headword kuYjara. And looking at the picture in the first post, it seems that this data was deliberately left out. It may be added. But someone would have to do the hard work of typing it.

Also note that not only the gender information, but IAST version is also missing. kuñjara is also missing.

Many other dictionaries in Cologne do have Devanagari and IAST version encoded. The picture shows that there were two versions .kuJjara1(kuJjara) in initial text encoding.

@funderburkjim may like to tell whether there was any difference between Devanagari and IAST prints (both encoded in HK as per the picture) in monier.xml or some of its predecessor any time? This would help us identifying some spelling error.

Andhrabharati commented 3 years ago

With this extract from the MW99 itself, which mentioned about the word-endings being provided, I stop talking on the matter.

image

Andhrabharati commented 3 years ago

We can close this issue now, as having been covered as one of "the points identified to be considered yet" that are put in a TODO list.