Open vvasuki opened 2 years ago
why not let manual formatting improvements come through at whatever rate they do - as long as they don't affect future programmatic corrections?
You want to go away from the original dictionary format?
You want to go away from the original dictionary format?
Where it makes sense - yes! One has to use "common sense" and see from the perspective of dict users. Not so hard. Constraints of printing in 2-column format paper 100+ years ago don't apply to computer screens. And users have come to adapt new equivalent notations and routinely use more punctuations.
Also, today's scenario where users easily and routinely refer to dozens of dicts side by side, consistency in notation becomes a matter of concern (Eg. https://github.com/sanskrit-lexicon/csl-ldev/issues/7 ). That too motivates harmless deviations from the original.
Everyone read this please (via @drdhaval2785 at https://github.com/sanskrit-lexicon/csl-ldev/issues/7#issuecomment-1044433948 ):
The creation of a TEI version of the Cologne Sanskrit Lexicon is part of the Lazarus Project1 and aims for long-time preservation of the data. It is based on the original digitisations and mark-up versions of the CSL and uses the TEI Guidelines, especially the dictionary module. The objective of the TEI Cologne Sanskrit Lexicon is to preserve all information contained in the original prints, as far as it was preserved in the digitisation process (Kapp and Malten, 1997, as described in), while using a well docu- mented and standardised XML. The second objective is to display the information as con- sistent and faithfully as possible to the original prints, while allowing the user to choose the writing system in which the Sanskrit words are displayed.
So, no one needs to obsess over "keeping it close to original" here. Others have that aspect well under control. This project can move along to the objective of best serving today's users.
Case in point - https://github.com/indic-dict/stardict-sanskrit/issues/139
The same dissatisfaction bothers me. Do I feel like reading the mess below?
It could be presented so much better. I hope this changes either here or in some project which will render all this obsolete.
Link for TEI Sanskrit Lexicon: http://c-salt.uni-koeln.de/
There is no ongoing collaboration between the 'Github/sanskrit-lexicon' (CDSL) project at Cologne and the 'C-SALT' project at Cologne.
Maybe @fxru could provide a description of the relation between CDSL and C-SALT.
... this mess could be so much better
Would you provide a mock-up of a better presentation? This would help others understand what is in your mind.
There is no ongoing collaboration between the 'Github/sanskrit-lexicon' (CDSL) project at Cologne and the 'C-SALT' project at Cologne.
I didn't say there was; and that's good think too! That leaves both projects free to pursue their distinct goals without compromise. The goal of CDSL should be to present what the dict maker intended in the best possible way given the current non-paper media and tech.
... this mess could be so much better
Would you provide a mock-up of a better presentation? This would help others understand what is in your mind.
विकल्पः, पुं, (विरुद्धं कल्पनमिति । वि + कृप + घञ् ।)
भ्रान्तिः ।
(यथा, देवीभाग-वते । १ । १९ । ३२ ।
“विकल्पोपहतस्त्वं वै दूरदेशमुपागतः ।
न मे विकल्पसन्देहो निर्व्विकल्पोऽस्मि सर्व्वथा ॥”)
कल्पनम् । इति मेदिनी । पे, ॥
(यथा, भागवते । ५ । १६ । २ ।
“तत्रापि प्रितव्रतरथचरणपरिखातैः सप्तभिः सप्त सिन्धवः उपकॢप्ताः ।
यत एतस्याः सप्तद्वीपविशेषविकल्पस्त्वया भगवन् खलु सूचितः ॥”)
संशयः । यथा, रघुः । १७ । ४९ ।
(“रात्रिन्दिवविभागेषु यथादिष्टं महीक्षिताम् ।
तत्सिषेवे नियोगेन स विकल्पपराङ्मुखः ॥”)
नानाविधः । यथा, मनुः । ९ । २२८ ।
(“प्रच्छन्नं वा प्रकाशं वा तन्निषेवेत यो नरः ।
तस्य दण्डविकल्पः स्याद्तथेष्टं नृपतेस्तथा ॥”)
विविधकल्पः । स च द्विविधः । व्यवस्थितः । एच्छिकश्च । सोऽप्याकाङ्क्षाविरहे युक्तः ।
तथा च भविष्ये -
See how much more pleasant and readable that is?
Certainly the format you show is pleasant.
From my naive perspective, I do not see how it derives from the vacaspatyam text -- there is almost no overlap between the two texts.
What am I missing?
What am I missing?
That was kalpadruma. Compare with:
Also, please refer to https://github.com/sanskrit-lexicon/csl-ldev/pull/3#issuecomment-1043240375 linked in the first post above - there was even an objection to the addition of quotation marks around quotes because "Not traceable in the printed text"! Such robotic fidelity should be dropped.
Markup can generate the nicer format.
Here is the bit of the vikalpa digitization corresponding to sample display:
OLD
<L>32332<pc>4-371-b<k1>vikalpaH<k2>vikalpaH
vikalpaH¦, puM, (virudDaM kalpanamiti . vi +
kfpa + GaY .) BrAntiH . (yaTA, devIBAga-
vate . 1 . 19 . 32 .
“vikalpopahatastvaM vE dUradeSamupAgataH .
na me vikalpasandeho nirvvikalpo'smi sarvvaTA ..”)
kalpanam . iti medinI . pe, .. (yaTA, BAga-
vate . 5 . 16 . 2 .
“tatrApi pritavrataraTacaraRapariKAtEH saptaBiH
sapta sinDavaH upakxptAH . yata etasyAH sapta-
dvIpaviSezavikalpastvayA Bagavan Kalu sUcitaH ..”
And the changes which generate the above:
NEW
vikalpaH¦, puM, (virudDaM kalpanamiti . vi +
kfpa + GaY .) <lb/><lb/>BrAntiH . <lb/>(yaTA, devIBAgavate <lbinfo n="devIBAga+vate"/>
. 1 . 19 . 32 .
<lb/>“vikalpopahatastvaM vE dUradeSamupAgataH .
<lb/>na me vikalpasandeho nirvvikalpo'smi sarvvaTA ..”)
<lb/><lb/>kalpanam . iti medinI . pe, .. <lb/>(yaTA, BAgavate <lbinfo n="BAga+vate"/>
. 5 . 16 . 2 .
<lb/>“tatrApi pritavrataraTacaraRapariKAtEH saptaBiH
sapta sinDavaH upakxptAH . <lb/>yata etasyAH saptadvIpaviSezavikalpastvayA <lbinfo n="sapta+dvIpaviSezavikalpastvayA"/>
Bagavan Kalu sUcitaH ..”
As you see, there are only two pieces of markup:
<lb/>
to generate a line break<lbinfo n="X+Y/>
to resolve text with extra '-' at line breaks.The lbinfo is awkward to write, but could be simplified such as
kfpa + GaY .) <lb/><lb/>BrAntiH . <lb/>(yaTA, devIBAgavate <lbinfo n="devIBAga+vate"/>
SIMPLER, using a special character (such as '@')
kfpa + GaY .) <lb/><lb/>BrAntiH . <lb/>(yaTA, devIBAga@vate
Thus, at least for skd, the digitization could be changed so that
For comparison to the skd-dev example above, here is the current display of vikalpaH in skd:
the 'sanctity' of the original digitization is maintained.
Why put that burden on yourself? As mentioned there is a separate project focused on "sanctitiy"-preservation.
Sure - I suppose that @drdhaval2785 's scripts can insert such extra new-lines or quotes using your markup based on what users update (at csl-dev?) - it's just more (unnecessary) trouble; and is furthermore a cause for delay.
it's just more (unnecessary) trouble; and is furthermore a cause for delay.
What is your proposed remedy? What is your proposed path ending in a better display of skd?
Why put that burden on yourself? As mentioned there is a separate project focused on "sanctity"-preservation.
[Here is link to 'lazarus project' : https://cceh.uni-koeln.de/portfolio/lazarus/]
I think this ('sanctity ...') remains a responsibility of CDSL. We can't just say 'Oh, someone else is taking care of this aspect.'
However, we are not restricted to only this task.
We are free to create better displays, for instance better displays for skd.
@vvasuki Are you interested in leading an effort for a better skd?
We are free to create better displays, for instance better displays for skd. @vvasuki Are you interested in leading an effort for a better skd?
No. All I want is for users (myself included) to be able to add superior presentation markup wherever they care to while referring to the dict, and for maintainers not to reject such improvements out of hand. So, it should be written down in some contribution policy somewhere.
And, @drdhaval2785 - please clear backlog at https://github.com/sanskrit-lexicon/csl-ldev/pulls - I recently thought of editing some typo, but gave up upon seeing it.
I think this ('sanctity ...') remains a responsibility of CDSL. We can't just say 'Oh, someone else is taking care of this aspect.'
CDSL is free to burden itself of course, but I am curious why you think you can't just say 'Oh, someone else is taking care of this aspect.'
Will clear backlog soon.
Related, but insufficient - https://github.com/sanskrit-lexicon/COLOGNE/issues/419
I observed in a few threads some insistence on sticking to "what's in the printed text" - even with regards to punctuation and formatting!
Opening this thread so that it may be considered more fully. Some pertinent notes:
Given that git + dict system allows:
why not let manual formatting improvements come through at whatever rate they do - as long as they don't affect future programmatic corrections?