GRA corrections/changes while (AB) re-working

Andhrabharati commented 1 year ago

Any line starts only with one of the 5 types-- <L>, Header, <div, <F> and <LEND>
No blank lines are present within the entry portion; and just a single blank line is present when a new entry starts.
Within the entry, there are tags like <F>…</F> ; for Footnote type (only one occurrence) <ab>…</ab> ; regular or global abbr. <ab n="xxx">…</ab> ; local or variant abbr. <gk>…</gk> ; for strings in greek script <heb>…</heb> ; for strings in hebrew script (only one occurrence) <hom>…</hom> ; for homonym numbers <lang>…</lang> ; for various languages (mostly) in abbr. form <ls>…</ls> ; regular ls type <ls n="xxx">…</ls> ; 'padded' ls type
The <div tagging is changed to various "meaningful" tags, like <div n="H" for Header type. <div n="Pf" for Prefix (upasarga) type; this has identified a new upasarga ácha, which is not in the std. list of 22!! <div n="TS" for Termination/Suffix type; this has identified many places that do not have a preceding hyphen. <div n="W" for Whole Word type; this occurs mostly in pronoun category. <div n="P" for a simple new Paragraph.
The diacritic marks (accent etc.) are mostly to be applied to Sanskrit words only, but not to the European languages. This exercise has changed quite a few non-ascii letters.

Andhrabharati commented 1 year ago

In the Sanskrit words, ai (which is marked otherwise as ē) and au (which is marked otherwise as ō) do not occur in Grassmann's 'theme'; they should either be separated by a hyphen or the second vowel should be with Umlaut (ï or ü resp.).

At the end, we've 4 words with (a-i)-- a-i; áśva-iṣita; {@páśva-iṣṭi,@}; {@vásya-iṣṭi,@} 4 words with (aï)-- {@daïṣṇá,@}; -aïṣṇám; -aïṣṇaís; {@yaïṣṭha,@}

6 words with (a-u)-- a-u; {@ácha-ukti,@}; ca-utá; {@náma-ukti,@}; {@úpa-upa párā@}; {@úpa-upa párā@} 2 words with (aü)-- {@títaü,@}; {@híraṇya-praüga,@}

Andhrabharati commented 1 year ago

To match the opening and closing brackets '(' and ')', all the open-ended list-type brackets like 1), 1a) etc. are changed to right-angle bracket 〉; and this facilitated identifying the mis-matched bracket pairs very easily. [This right angle bracket can now be reverted back to ')'.]

Andhrabharati commented 1 year ago

Matching of square brackets was already done earlier, which was taken as the basis of GRA_6 to GRA_9 (by Jim).

Andhrabharati commented 1 year ago

There are 30 places where a new text has been 'padded' [as %%…%%] to the existing text; many (26) of them are at the end of a line, and the next entry word is supposed to be read in continuation (which is possible in the running print matter; but not in separated entries in digital search results!)-- [I recall seeing such places earlier in one of the CDSL works, probably in BEN.]

17 cases of "enthalten in: %%…%%" 4 cases of "enthalten in %%…%%" 4 cases of "davon %%…%%" 1 case of "Substantiv in: %%…%%"

Probably, @maltenth might need to justify this!

One place where %%<lang>lat.</lang>%% is padded, corroborated by PWG.

Two places have these %%…%% as comments reg. the entries-- at ++<L>5395.1 as %%This entry was at L-5397 instead.%%. and at <L>8303 as %%isn't it more appropriate to split these into two sep. entries, as— <hom>1.</hom> {@vītá,@} <ab>Part.</ab> II. von vī. & <hom>2.</hom> {@vītá,@} <ab>Part.</ab> II. von vyā.%%

Andhrabharati commented 1 year ago

I had added a √ mark before the verbal (dhātu etc.) entries, which are presented in the print in extra-heavy typeface.

Few others, that could be in this set, are marked by framing 'a rule' myself (encompassing the non-dhātu entries), and marked them with the !√.

sanskrit-lexicon / GRA

GRA corrections/changes while (AB) re-working #25