Closed funderburkjim closed 6 years ago
The conversions have now been completed.
6 missed headwords were discovered, and properly recoded with deimal L-numbers.
Generally Sanskrit words are presented in Devanagari within the text; this includes headwords.
However, some words appearing in Latin alphabet have letters with diacritics.
Some of these words are related to Sanskrit words such as Vêdorum
.
No attempt has been made to impose a 'modern IAST' spelling to such words as Vêdorum -- eg., we
leave the circumflex diacritic as printed.
There are also numerous words in other languages in what appear to be etymological comments. For instance, there are 135 instances of 'russ.' indicating presumably related or cognate Russian words.
The spelling of these words also has been coded with unicode characters which aim to approximate the diacritics of the text.
The digitization recognizes the line breaks of the text. New lines of text are generally marked as
<div n="lb">
.
The original digitization also identifed the prefixes occurring within roots, and these lines have been
marked as <div n="pfx">
. For example under root gam
:
There are about 1400 instances of Greek text; the Greek is uncoded and is marked as <lang n="greek"></lang>
.
There are about 50 footnotes in the entries. The original digitization has been rearranged using the same strategy as used for footnotes in krm.
Here is the display for the footnote under headword akza
:
As with other dictionaries coded line-by-line, the resulting digitization might be more useful if the
hyphenated words were presented in unhyphenated form. This could be done without information loss by using the <lbinfo n="N"/>
markup idea used in Burnouf and other dictionaries.
The printed text preface contains two pages of abbreviations. These pages are also part of
the bop.txt digitization. This list could be used as a guide to applying the <ls>X</ls>
and <ab>X</ab>
markup (for literary sources and general abbreviations) .
Additional headwords could be generated for the prefixes associated with root entries. The regularity of the coding following the already present <div n="pfx">
markup would solve the primary problem of identification. Of course the problem of sandhi between the prefix and root needs to be solved also.
There are six pages of ADDENDA ET EMENDANDA .
The bop.txt digitization also contains these additions and corrections. They could be applied to the digitization entries. We have not thus far developed a satisfactory markup scheme for such a task; the approach used in a similar task for MW should be examined as a guide line whenever this task for BOP is undertaken.
These are all the comments that come to mind regarding the BOP conversion.
Give Greek to the Greeks between us, Jim!
This issue devoted to the conversion of bop.txt, the Cologne digitization of Bopp's Glossarium Sanscritum, a Sanskrit-Latin dictionary.