Closed funderburkjim closed 3 years ago
<div>
in wil.xmlIn his digitization of Wilson, Thomas has identified in a fairly regular form much of the section markup of the printed text. For the sake of uniformity with the system used in other dictionaries, this markup was changed to use the <div>
tag. This change was made in wil.xml (not in wil.txt).
Here's an example:
WIL.TXT
<L>6<pc>001<k1>aMSa<k2>aMSa
{#aMSa#}¦ m. ({#-SaH#})
.²1 A share or portion.
.²2 A part.
.²3 A shoulder, the shoulder blade.
.²4 (In arithmetic) a fraction.
.²5 The numerator of a fraction.
.²6 A degree of latitude or longitude, &c. See {#aMsa.#}
.E. {#aMSa#} to divide, {#ac#} affix.
WIL.XML
<H1><h><key1>aMSa</key1><key2>aMSa</key2></h><body>
<s>aMSa</s> m. (<s>-SaH</s>)
<div n="1">1 A share or portion. </div>
<div n="1">2 A part. </div>
<div n="1">3 A shoulder, the shoulder blade. </div>
<div n="1">4 (In arithmetic) a fraction.
</div><div n="1">5 The numerator of a fraction. </div>
<div n="1">6 A degree of latitude or longitude, &c. See <s>aMsa.</s> </div>
<div n="E">E. <s>aMSa</s> to divide, <s>ac</s> affix. </div>
</body><tail><L>6</L><pc>001</pc></tail></H1>
In this well-chosen example, the main advantages of the div markup are:
<div>
markupI stumbled upon one other related markup -- ^a
in the root RI
(mw = 'nI').
Such markup was coded as <div n="2">X</div>
(note the 2
).
Part of the html display shows the advantage this can provide:
Contrast this with the confusing display of the prefixed forms under headword 'hf', where there are no sub-divisions:
However, as mentioned, this distinction of sub-sub-sections occurs only sporadically in wil.txt. But the digitization could be enhanced by applying it more widely. For instance, in the same RI entry:
The '**' below indicates where sub-divisions would be useful, but are currently not coded.
Note the difficulty in interpreting sub-divisions at line marked <<<
णी (ञ) णीञ् r. 1st cl. (नयति-ते)
1 To conduct, to drive or guide, to cause progressive conveyance.
2 To obtain, to get. The root is inflected as the deponent verb, implying.
** 1 Instruction, as नयते शास्त्रे he instructs in the Śāstra;
** 2 Worshipping, विष्णुंनयते he worships VIṢṆU; also with prepositions in the sense of; <<<
** 1 Paying, भृत्यमुपनयते he pays the hire;
** 2 Paying as a debt, ऋणम्विनयते he discharges the debt;
3 Casting or lifting up, दण्डमुन्नयते he lifts up the stick;
4 Giving, द्रव्यम्बिनयते he gives the things; and
A different type of markup enhancement would improve the reading of entries of substantives, where there are meanings associated with different genders, such as in aMSaka:
CURRENT - no division markup for m. and n.
<H1><h><key1>aMSaka</key1><key2>aMSaka</key2></h><body>
<s>aMSaka</s> m. (<s>-kaH</s>) A kinsman, a relation, a coheir. n. (<s>-kaM</s>) a day.
<div n="E">E. <s>aMSa</s> to separate or divide, and <s>vun</s> affix. </div>
</body><tail><L>7</L><pc>001</pc></tail></H1>
BETTER - division markup for m. and n.
<H1><h><key1>aMSaka</key1><key2>aMSaka</key2></h><body>
<s>aMSaka</s>
<div n="1">m. (<s>-kaH</s>) A kinsman, a relation, a coheir. </div>
<div n="2">n. (<s>-kaM</s>) a day. </div>
<div n="E">E. <s>aMSa</s> to separate or divide, and <s>vun</s> affix. </div>
</body><tail><L>7</L><pc>001</pc></tail></H1>
Contrast this with the confusing display of the prefixed forms under headword 'hf', where there are no sub-divisions:
No need to contrast, it's a must have. Only that WIL is not the top 10 most widely used dictionary and spending months on it would be something to think about before diving into it. Otherwise very user friendly. PWK's upasargas would love to see it coming for sure as well.
The conversion of wil.txt to our new standard meta-line form has now been completed. Once the IAST conversion was done, this meta-line conversion was fairly straightforward.
One small change was to replace the coding of Greek and Arabic to the generic
<lang>
tag.