sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

INM meta/iast conversion #206

Closed funderburkjim closed 6 years ago

funderburkjim commented 6 years ago

This issue is for comments regarding the conversion of the Cologne digitization inm.txt of the work Index to the Names in the Mahabharata.

funderburkjim commented 6 years ago

IAST issues

The text always uses Latin alphabet with diacritics for Sanskrit words. Generally, the conventions of the text agree with modern IAST, but with the differences:

There is some incompleteness in the conversiion of 'sh' to ṣ. This conversion must be restricted to Sanskrit words, to avoid undesired conversions in English words such as 'should. For the 'sh' conversion, the following assumptions were used:

I'm sure there are some 'sh' conversions in Sanskrit words which are missed, (such as words or abbreviations which are not in italics and don't have a diacritic).

There are a few (40) cases where a vowel (with or without macron) also has a breve diacritic.

funderburkjim commented 6 years ago

Sections of the text

The digitization includes not only the main section of entries, but apparently all of the text. There are the following sections:

; TITLE
; FOREWARD
; PREFACE
; ABBREVIATIONS
; CONCORDANCE (33 pages)
; ENTRIES  about 13000 headwords
; ADDITIONS AND CORRECTIONS  (18 pages)
; POSTSCRIPT

Since all of the non-entry sections are digitized (part of inm.txt), it would be feasible to include them in the Front matter section .

funderburkjim commented 6 years ago

Suggested Enhancement: abbreviations

There are digitized sections on abbreviations in the preface. These could provide the basis for <ab> markup that would facilitate tooltips for users.

funderburkjim commented 6 years ago

Possible additional headwords

There are at least two possible sources of additional headwords.

<div n="HI">

This markup appears 22 times within entries. For instance under headwords DanadA:

<L>3353<pc>240-1<k1>DanadA<k2>DanadA
{@Dhanadā,@}¦ a mātṛ. § 615{%u%} (Skanda): IX, {@46<lang n="greek"></lang>,@} 2631.
<div n="HI">{@Dhanadeśvara, Dhanādhigoptṛ, Dhanādhipa,@}
<div n="lb">{@Dhanādhipati@}¦ = Kubera, q.v.
<LEND>

image

It appears that this is a typographically abbreviated form of four headwords. If these were recoded somehow as separate entries, then about 80-100 additional headwords would be added.

Additions and corrections to Index

In the addtions and corrections sections, the first shorter part pertains to the Concordance, and the second longer part pertains to the index (i.e. to what we have coded as headwords). The formatting of this second part would make it possible to add as new headwords all the entries, whether additions or corrections. There are about 950 such entry-like sections.

Example of correction to Index

aBiBU original entry: image

aBiBU entry correction image

Example of addition to index

aBiprAya -- does not appear as headword in main index, but does appear in the additions and correctinos: image

funderburkjim commented 6 years ago

Markup peculiarities

<div n="X">

This markup can have X as

<F>

Indicates footnotes. about 30 instances. Recoded in the style adopted with KRM (#200).

<sup>

This is used for superscript text. General functions are:

<lang n="greek"></lang>

Many (9600) instances. However, at least some of these are one or two letters, used for some kind of indexing, rather than Greek words; here are two examples from first page.

image

image

<C n="N">

This markup occurs in 200 lines, and indicates columns in a complex tabular arrangment of text, such as genealogical relationships. For instance: image

We currently have only a crude representation of this: image

funderburkjim commented 6 years ago

The converted form has now been installed at Cologne.