Closed Andhrabharati closed 1 year ago
The <hom>
letter tags 'a', 'b' etc. are introduced additionally in the text, which are NOT in the print, for some reason.
As a minimum, the 'a' and 'b' counts should be equal, i.e. every 'a' should have associated 'b' somewhere in the text; otherwise having an isolated 'a' has no meaning. (Having lesser numbers for the next letters is understandable.)
But the counts are different-
Resolving this, to achieve no count differences would be a good exercise indeed.
--------------------------
And, the same condition (equal numbers) applies to <hom>
1 (total counts: and <hom>
2 as well, the counts being as below-
Some of the inconsistencies pointed out above have been addressed.
(158) <hom>n</hom> -> <hom>n.</hom> (n a digit, 1 to 6)
(123) <hom>n\.</hom> -> <hom>n</hom> (n a lower-case letter , [a-c]
( 1) </hom><s> -> </hom> <s>
( 31) in the 'headline' (line after metaline), numeric hom precedes headword
<s>X</s> <hom>n.</hom> ¦ -> <hom>n.</hom> <s>X</s> ¦
( 15) numeric hom to precede Sanskrit after =
= <s>X</s> <hom>n.</hom> -> = <hom>n.</hom> <s>X</s>
NOTE: There remain 47 cases of form '<s>X</s> <hom>n.</hom>', but these
require no changes since the form is always
'<s>X</s> <hom>n.</hom> <s>Y</s>'
There were several (about 140) cases of either <h>[0-9][a-z]
or
`
The dictionary identifies for about 5700 entries (according to mw.txt digitization). Sometimes the homonym variants appear 'next to' or 'near' each other in the dictionary ordering. But often they appear some distance apart, and it is sometimes useful to navigate among the homonyms in the list display to see the dictionary context of the variants. The 'arrows' in the list display provide this functionality. For instance, clicking on the 'yellow' arrow will change the left-hand list pane to be centered at that second homonym
In MW, a given headword can appear at different dictionary locations, yet these different locations are not identified by the author as homonym variants.
Letter homonyms were 'invented' by Peter Scharf to permit the list display navigation to these otherwise unmarked entries. Since MW's printed homonym codes are numeric, there is no confusion between MW's printed homonyms and the synthetic letter homonyms.
The list display homonym navigation feature uses only the metaline <h>X
values.
The <hom>X</hom>
code is unused by the navigation, but is used to style the homonym value X (red color).
Here is an example:
The <h>[a-z]
codes were originally assigned by a Java program developed by Pawan Goyal, working with Peter. And I have adapted this to the current format of mw.txt.
This display feature makes use of the <h>
element in the metaline of entries.
10645 matches for "<h>[a-z]" in buffer: temp_mw_1.txt
.
I don't recall whether Pawan's original work introduced the few (100+) 'letter-number' h-values, or whether that was done by me at some time.
As I recall, it was my choice to put the <hom>X</hom>
(X a letter) into entry text.
And I must have for some reason chosen to put the letter homs AFTER the headword, such as
<s>X</s><hom>Y</hom> ¦
My current view is that I should have
<hom n="a"/>
instead of <hom>a</hom>
.
<hom>a</hom>
implies, by usual xml markup conventions, that the 'a' IS part of the text being marked. <hom n="a"/>
is interpreted as being purely markup.<hom n="a"/>
to display the 'a' as it is currently shown.I have not made the changes just suggested in the 'I should have...' items.
Request comment by others as to whether this should be done.
Even if the suggested change is made, there are still cases where it is difficult to know how to add markup. For example, in 'or' groups, where the second word has a homonym designation.
I am sure there is further work to be done regarding homonyms in mw, but will put this aside for now as the situation is still unclear to me in several aspects.
By contrast, the empty tag
is interpreted as being purely markup.
Makes sense.
I have "handled" the matter in my current full review, and thus this issue can be closed now.
<hom>
number (if not with a letter) is followed by a dot and the<hom>
letter is without a dot. But the odd-mans are--<hom>
tag placements wrt the Skt. word are--Should these not be made consistent throughout? [The
<hom>
number to precede the<s>
word (with a space) & the<hom>
letter to follow the<s>
word (with a space).]---------------
[Side-note: There are 11 "<s>
" and 1 "</s>
" occurrences.]