Closed Andhrabharati closed 1 year ago
ai
(which is marked otherwise as ē
) and au
(which is marked otherwise as ō
) do not occur in Grassmann's 'theme'; they should either be separated by a hyphen or the second vowel should be with Umlaut (ï or ü resp.).At the end, we've 4 words with (a-i)-- a-i; áśva-iṣita; {@páśva-iṣṭi,@}; {@vásya-iṣṭi,@} 4 words with (aï)-- {@daïṣṇá,@}; -aïṣṇám; -aïṣṇaís; {@yaïṣṭha,@}
6 words with (a-u)-- a-u; {@ácha-ukti,@}; ca-utá; {@náma-ukti,@}; {@úpa-upa párā@}; {@úpa-upa párā@} 2 words with (aü)-- {@títaü,@}; {@híraṇya-praüga,@}
17 cases of "enthalten in: %%…%%" 4 cases of "enthalten in %%…%%" 4 cases of "davon %%…%%" 1 case of "Substantiv in: %%…%%"
Probably, @maltenth might need to justify this!
One place where %%<lang>lat.</lang>
%% is padded, corroborated by PWG.
Two places have these %%…%% as comments reg. the entries--
at ++<L>5395.1
as %%This entry was at L-5397 instead.%%.
and
at <L>8303
as %%isn't it more appropriate to split these into two sep. entries, as— <hom>1.</hom> {@vītá,@} <ab>Part.</ab> II. von vī.
& <hom>2.</hom> {@vītá,@} <ab>Part.</ab> II. von vyā.
%%
Few others, that could be in this set, are marked by framing 'a rule' myself (encompassing the non-dhātu entries), and marked them with the !√.
Any line starts only with one of the 5 types--
<L>
,Header
,<div
,<F>
and<LEND>
No blank lines are present within the entry portion; and just a single blank line is present when a new entry starts.
Within the entry, there are tags like
<F>…</F>
; for Footnote type (only one occurrence)<ab>…</ab>
; regular or global abbr.<ab n="xxx">…</ab>
; local or variant abbr.<gk>…</gk>
; for strings in greek script<heb>…</heb>
; for strings in hebrew script (only one occurrence)<hom>…</hom>
; for homonym numbers<lang>…</lang>
; for various languages (mostly) in abbr. form<ls>…</ls>
; regular ls type<ls n="xxx">…</ls>
; 'padded' ls typeThe <div tagging is changed to various "meaningful" tags, like
<div n="H"
for Header type.<div n="Pf"
for Prefix (upasarga) type; this has identified a new upasarga ácha, which is not in the std. list of 22!!<div n="TS"
for Termination/Suffix type; this has identified many places that do not have a preceding hyphen.<div n="W"
for Whole Word type; this occurs mostly in pronoun category.<div n="P"
for a simple new Paragraph.The diacritic marks (accent etc.) are mostly to be applied to Sanskrit words only, but not to the European languages. This exercise has changed quite a few non-ascii letters.