sanskrit-lexicon / mw-dev

Development version of MW dictionary, to collaborate with Andhrabharati
1 stars 0 forks source link

MW full-review-005b: Cognate words and the related tagging #21

Open Andhrabharati opened 1 year ago

Andhrabharati commented 1 year ago

All the languages (Foreign or Indian) are brought under the <lang>...</lang> tagging, whether they occur under <ab> tag, <s1> tag or untagged earlier. At very few places, where the words denoted the country/place instead of the language-proper, they are not marked under this tag.

And this has facilitated easier (visual) identification) of <s> tags occurring after these that should've properly been tagged under <etym> [earlier cdsl tag, which is now changed as the better form <cog> denoting the "cognate form" of the Skt. word under discussion in the resp. language] or the reverse case [<etym> that should've been <s>].

The example words with those cognate words are changed to <i> tag, from the erstwhile<ety> tag, wherever such examples are seen (they not being strictly "cognates").

Next, the implied <lang n="..."> tags for Greek and Arabic are changed to the plain <gk> and <ar> tags, as they are not indicating the language-proper but just the script used-- Greek script being used under the languages Gk. (Greek), Aeol. (Aeolic), Dor. (Doric) and Ion. (Ionic); and the Arabic script used under the languages Hind. (Hindūstānī), Pers. (Persian), and Arab. (Arabic). As these are mostly preceded by the resp. lang tag, there is NO special need to mark the "lang & script" notation as done earlier. [My recent working with Mayrhofer's Etym. Dictionary has played a big role in broadening my perspective in this respect.]

Finally, though MW had intended to mark just the European languages as the cognates (as in the Title of his work!), I took liberties in extending the notation to cover the Indic languages like Pāli, Prākṛit, Bengālī and Marāṭhī wherever such cognate forms are seen.

Here is the list of cognate lang. tags-- cognate lang tags.txt

Andhrabharati commented 1 year ago

These lines are marked with a at the beginning (for quick extraction to send to Jim for incorporating in the cdsl file); now this mark could be removed in the mw_AB.txt