Closed funderburkjim closed 6 years ago
@drdhaval2785 is there an automated way of checking the correctness of Kṛdantarūpamālā's forms based from your verb analysis?
No. There is not. Currently only tiNanta forms are generated. Not kfdanta.
Not kfdanta
Bad luck.
The conversion is almost complete. The most important change is regarding Footnotes.
The text is organized as a sequence of entries, numbered 1 to 2039 (with a few extra labeld e.g. (41-A)); headwords are roots in dhātu-pāṭha form (i.e., with anubandhas). Each entry consists primarily of a list or table of krdanta forms derived from the root. There are copious footnotes.
To understand the original digitization conventions regarding coding of footnotes and the changes introduced to the new meta-line coding, you need to look at the pages 3 and 4 of the printed text. The first page has first part of entry for 'aka', then a section of footnotes for the page. The second page has the remaining part of aka entry (with two more footnotes), and the beginning of second entry for 'aki', which also has some footnotes indicators, then the bottom of the page has footnotes for the page. Next two comments show scans of these two pages.
It's difficult to know how to code the footnotes in such a way that the footnotes associated with a particular entry are within the scope of coding of the entry itself. A naive coding would just code the data line-by-line. But then there would be the problem of associating the first two footnotes of page 4 with aka entry.
So instead, Thomas decided to shoe-horn each entire footnote at the location of its mention. Here is how the beginning of 'aka' looks, up through the second line of the table (IAST coding), This is excerpted from the Basic Display before the current conversion:
(1) “aka kuṭilāyāṃ gatau” (ī-bhvādiḥ-792 sakarmakaḥ-seṭ-parasmaipadī) ghaṭādiḥ mit .
‘iditastvaṅkate tatra kuṭilāyāṃ gatāvaket .’ (ślo 41) iti devaḥ .
ṇic- san-
ṇvul ākakaḥ— kikā,
[Footnote: 1. ‘mitāṃ hrasvaḥ’ (6-4-92)
iti ṇau upadhāyā hrasvaḥ .]
akakaḥ— kikā, acikiṣa
[Footnote: 1A ‘ajāderdvitīyasya’ (6-1-2) iti dvitīya-
syaikācaḥ dvitvam . ‘kuhoścuḥ’
(7-4-62) ityabhyāsasya cutvam .]
kaḥ— ṣikā;
tṛc (tṛn) akitā-trī, akayitā-trī, acikiṣitā-trī;
While the problem of footnote attachment is clearly solved by this coding, the resulting display grossly distorts the reading of the table of krdantas.
The main idea of the current strategy of coding is to place a footnote marker within the table, and then to collect the corresponding footnotes for the entry at the bottom of the entry. The next comment shows how the total entry for aka looks (snapshot from mobile1 display).
There are a few more comments that need to be made. I'll get to them tomorrow.
problem of footnote attachment is clearly solved by this coding, the resulting display grossly distorts the reading of the table of krdantas
Exactly. I'll be off till 24nd February, do not loose me, heading Poona.
Although the changes in footnote coding definitely improve the display of the tabular data within this work, there remain several weaknesses; here are a couple that catch my eye.
The last entry in the aka table illustrates this phenomenon:
The underlying digitization uses a tag <note n="
"/>` to identify this as a problem area; This is quite common - occurring 700+ times.
In the cases of aka, the table has both columnar labels (ṇic- san- ), and row labels (ṇvul , tṛc , etc.) Additional markup is required to distinguish these grammatical labels from the kridanta entries. Such markup would make it possible to develop a search facility whereby a user could determine that , for instance, AkaH is a kridanta of 'aka'.
The aki entry does not similarly show such labels; perhaps the labels are implicit, or perhaps there is some other organizing principle -- situation is unclear to me.
Line breaks are significant in many parts of the text (such as to indicate table rows in aka, aki).
In cases where a footnote is the first element in a line, the original footnote coding obscures the
fact that a line-break precedes the footnote marker. This happens, for instance, at footnote marker '9'
in third entry akṣū. This error can be corrected (by inserting a <div n="lb">
tag prior to the
footnote marker <sup>9</sup>
).
The footnote marker occasionally occurs within a kridanta. For instance under 'aka'
This positioning, although consistent with the printed text, obscures the full spelling of the kridanta.
My inclination would be to move such footnote markers to the end or beginning of words.
There is a wealth of information in this text; to expose this information to programmatic manipulation will require the efforts of some team with (a) sufficient technical knowledge of Sanskrit grammar to know how to interpret the details of the text (b) sufficient technical knowledge of markup principles to be able to devise a markup scheme that captures the grammatical information.
These brief observations may provide some hints when further work on this 'dictionary' is undertaken.
There are 2061 entries in krm after this work. About 20 of these were previously missed as separate entries due to a variation in the coding.
There are two correction sections in the full krm.txt digitization; these are separate from the entries exposed by the Cologne displays. They are identifIed by text '; BEGIN CORRECTIONS 1' and
'; BEGIN CORRECTIONS 2`. These sections occur at pages 1143 and 1427.
Here is beginning of second correction section:
<H><s>SoDanikA</s>
<NI><s>puwam paNktiH aSudDam SudDam</s>
<>501 17 <s>cAyakA cAyakaH</s> <<< first example
It would be a fairly straightforward task to implement these corrections. There are approximately 80 corrections in each of the two sections, or 160 corrections in all. Maybe someone can volunteer to do this.
off till 24nd February, do not loose me, heading Poona.
Will miss your comments.
If you talk to the PD team at Poona, maybe you can ask if they'll give permission for Cologne to display our digitization of their dictionary. This would be a way for there to be a much wider audience for their monumental work.
The main disp.php program used in the Cologne displays for krm was adapted from the pwg version. A few alterations were required for:
<div n="F">
identifies beginning of a Footnote text. inserts Footnote for readability.<Poem>
tag -- occurs twice. Functionally, just treated as line break for first line of poem. Occurs
under headwords 'kadi' and ziY
(slp1 spelling).This was quite simple for krm. In fact, the only IAST text appears in the appendices. The body of the text is all Devanagari and English.
The krm conversion task is now completed and the results installed.
Such markup would make it possible to develop a search facility whereby a user could determine that , for instance, AkaH is a kridanta of 'aka'.
Yeah, without it the scan is rather useless.
The aki entry does not similarly show such labels; perhaps the labels are implicit, or perhaps there is some other organizing principle -- situation is unclear to me.
Let's call for @Shalu411 .
My inclination would be to move such footnote markers to the end or beginning of words.
Makes sense. But is it not too big a task not worth the result?
Maybe someone can volunteer to do this.
If only @SergeA is around.
maybe you can ask if they'll give permission for Cologne to display our digitization of their dictionary.
Let me try.
Currently only tiNanta forms are generated. Not kfdanta.
https://gitlab.inria.fr/huet/Heritage_Resources/ subdirectory XML as well?
This issue for the meta-line conversion of KRM (Kṛdantarūpamālā).