sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

Grouped entries like GRA #163

Open funderburkjim opened 1 month ago

funderburkjim commented 1 month ago

As @Andhrabharati recently mentioned Ref:, there are good reasons to to change the coding of 'grouped' entries in MW to the model of Grassman dictionary.

One aspect of this is that the <h> element (h=homonym) of MW metalines would be removed. However, this <h> element plays a role in the list displays for MW. And we must determine a way to preserve this feature of the list display for MW. Currently I don't know how to do this.

The next comment describes this clever display feature, the idea for which originated with Peter Scharf.

funderburkjim commented 1 month ago

Note the 3 places in this search for saMvarta

image

Now if we click on the 'yellow arrow', we get the 'list' centered at the 2nd spot:

image
gasyoun commented 1 month ago

@funderburkjim so you want to leave it accordian like old way?

funderburkjim commented 1 month ago

@gasyoun Yes, I want to preserve the functionality as described above.

Andhrabharati commented 1 month ago

As @Andhrabharati recently mentioned Ref:, there are good reasons to to change the coding of 'grouped' entries in MW to the model of Grassman dictionary.

One aspect of this is that the <h> element (h=homonym) of MW metalines would be removed. However, this <h> element plays a role in the list displays for MW. And we must determine a way to preserve this feature of the list display for MW. Currently I don't know how to do this.

The next comment describes this clever display feature, the idea for which originated with Peter Scharf.

Though the h tag is discarded in the txt file, the info is very much preserved as the number (in both the metaline and the header part). It could be used for the display purpose as is, or the h tag could be (re)introduced in the xml file (if needed).

Does this make you feel relieved, @funderburkjim ?

funderburkjim commented 1 month ago

You are suggesting that the point of attack is make_xml.py for MW. This may be the only spot needing attention. Other stress points might be the AND, OR, groups and possibly the 'parenthetical headwords' markup in MW. For example, consider L=535, 536, the first 'and' group.

<L>535<pc>3,2<k1>akza<k2>akza<h>4a<e>2
<s>akza</s> <hom>4a</hom>, <s>akza-caraRa</s>, ¦ &c. See <ab>col.</ab> 3.<info and="535,akza;536,akzacaraRa"/>
<LEND>
<L>536<pc>3,2<k1>akzacaraRa<k2>akza-caraRa<h>a<e>2
<s>akza</s>, <s>akza-caraRa</s>, ¦ &c. See <ab>col.</ab> 3.<info and="535,akza;536,akzacaraRa"/>
<LEND>

What would be the new 'grouped' markup for mw.txt. How would this be modified using new markup?

Andhrabharati commented 1 month ago

This is how my revised mw file has this entry (in IAST))--

<L>535<pc>3,2<k1>akṣa<k2>⒋ akṣa, akṣa-caraṇa<e>2
⒋ <s>akṣa</s>, <s>akṣa-caraṇa</s>, &c.  ∆   ◊¦ See <col>col. 3</col>.
<LEND>
ⓓ
ⓓ
ⓓ

I have discarded the [a-z] tags in the homonyms, as they just denote the cross-referenced entries.

funderburkjim commented 1 month ago

The 'homonym' should not be '4.' in L=535, since there is a 'real' homonym 4 at L=660. Thus the L=535 homonym should be '4a.', I think in your revised form. So I think the a-z tags should be retained, to distinguish between real homonyms and cross-reference homonyms.

Andhrabharati commented 1 month ago

As I see, there is neither a clear logic nor a complete implementation of the hom. a & b letters in the mw.txt; there are many incongruencies in it & also these hom. letters are not at all employed at most of the places (that have the cross referencing). [I prefer not to elaborate further on this.]

I thought that instead of correcting all these & adding the letters at the "missed places", it is good enough (and simpler) to get rid of these altogether. [The context itself will make one know as to which one is the "actual" entry, and which one is the "cross-referencing" entry.]

funderburkjim commented 1 month ago

get rid of these altogether

What about cases where there is just an artificial hom (e.g. <h>[a-z], of which there are 10,000+), ? and also the associated <hom>[a-z]</hom> ?

Andhrabharati commented 1 month ago

All those also had the same "fate"!!