sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.
13 stars 9 forks source link

Fix data format for Andhrabharati and other willing individuals #604

Open drdhaval2785 opened 3 years ago

drdhaval2785 commented 3 years ago

Reference - https://github.com/sanskrit-lexicon/CORRECTIONS/issues/414

There has been suggestions to make corrections in Cologne data more user friendly. SLP1 is very much suited for programmatic manipulation, but its human readability has a steep learning curve. This issue is dedicated to come to a consensus regarding the format in which Cologne files may be given (with reversibility) to someone who wants to improve the files substantially, like @Andhrabharati .

drdhaval2785 commented 3 years ago

Example

rAma in PWG https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=PWG&page=6-0342

The corresponding entry in pwg.txt is as follows

<L>84468<pc>6-0342<k1>rAma<k2>rAma/<h>1
1. {#rAma/#}¦ (wohl desselben Ursprungs wie {#rAtrI#}) 
<div n="1"> 1) <lex>adj.</lex> (f. {#A#}) {%dunkelfarbig, schwarz%} 
<ls>NIR. 12, 13.</ls> 
<ls>AK. 3, 4, 23, 143.</ls> 
<ls>H. 1397.</ls> 
<ls>an. 2, 334.</ls> 
<ls>MED. m. 26. fg.</ls> <ls>HALĀY. 4, 49.</ls> {#rAme\ kfzRe\ asi^kni ca#} 
<ls>AV. 1, 23, 1.</ls> Schaf <ls>12, 2, 19.</ls> {#nAsya rAma (= ramaRIyaH putraH#} Comm.) {#ucCizwaM pibet#} 
<ls>TAITT. ĀR. 5, 8, 13.</ls> {#rAmA#} {%eine Dunkle%} d. i. {%ein Weib gemeiner Herkunft%}: {#nAgniM ci\tvA rA\mAmupe^yAt#} 
<ls>TS. 5, 6, 8, 3.</ls> 
<ls>TAITT. ĀR. 5, 8, 13.</ls> <ls>Schol.</ls> zu <ls>KĀTY. ŚR. 18, 6, 27.</ls> Auch die Bedeutung 2. {#rAma#} 2) {%c)%}
<lang n="greek">(α)</lang> wäre indessen hier möglich. Nach 
<ls>AK.</ls> <ls>H. an.</ls> und <ls>MED.</ls> auch {%weiss.%} 
<div n="1">— 2) <lex>m.</lex> 
<div n="2"> a) {%eine Hirschart%} 
<ls>AK. 2, 5, 11.</ls> 
<ls>TRIK. 3, 3, 302.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> 
<div n="2">— b) {%Pferd%} 
<ls>MED.</ls> 
<div n="2">— c) N. pr. eines Mannes 
<ls>ṚV. 10, 93, 14.</ls> mit dem patron. <is>Mārgaveya</is> 
<ls>AIT. BR. 7, 27. 34.</ls> <is>Aupatasvini</is> 
<ls>ŚAT. BR. 4, 6, 1, 7.</ls> <is>Jāmadagnya</is>, Verfassers von 
<ls>ṚV. 10, 110.</ls> Im Epos und später erscheinen {%drei%} <is>Rāma</is> (daher {#rAma#} als Bez. {%der Zahl drei%} 
<ls>VARĀH. BṚH. S. 8, 20</ls>), von denen die beiden ersten für Incarnationen <is>Viṣṇu's</is> gelten: 
<div n="3"> α) mit dem patron. <is>Jāmadagnya</is> oder <is>Bhārgava</is>, ein Sohn der <is>Reṇukā</is>, auch {#paraSurAma#} genannt, 
<ls>TRIK. 3, 3, 302.</ls> 
<ls>H. 848.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> (wo {#rERukeye#} st. {#vERukeye#} zu lesen ist). {#rAmaH SastraBftAmaham#} (vgl. 
<ls>HARIV. 5869</ls>) sagt <is>Kṛṣṇa</is> 
<ls>BHAG. 10, 31.</ls> 
<ls>MBH. 1, 272. 2612. 3, 8658. 8, 1584. 12, 1715. fgg. 12948.</ls> 
<ls>HARIV. 2313. fgg. 5869. fg.</ls> {#rAmarAmavivAda#} 
<ls>R. 1, 3, 11 (5 GORR.). 74, 22. fg. 76, 1.</ls> <ls>R. GORR. 1, 77, 23. 37.</ls> <ls>RAGH. 11, 68.</ls> 
<div n="3">— β) mit dem patron. <is>Rāghava</is> oder <is>Dāśarathi</is> 
<ls>TRIK. 2, 8, 3. 3, 3, 302.</ls> 
<ls>H. 703.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> 
<ls>MBH. 3, 11197. 15933. 12, 12949.</ls> 
<ls>HARIV. 822. 2324. fgg. 3065. fgg. 5871. 7373.</ls> 
<ls>R. 1, 1, 10. 17. 20.</ls> {#rAmarAmavivAda#} 
<ls>3, 11 (5</ls> <ls>GORR.).</ls> {#ramayatyeva sa guRErudArEstErimAH prajAH . yasmAdato rAma iti nAmEtattasya viSrutam ..#} 
<ls>R. GORR. 1, 1, 22. 6, 102.</ls> <ls>RAGH. 11, 68.</ls> <ls>VARĀH. BṚH. S. 58, 30.</ls> <ls>VP. 384.</ls> <ls>BHĀG. P. 9, 10. fgg.</ls> <ls>Spr. 2630.</ls> {#rAmo hemamfgaM ma rvetti#} 
<ls>2631.</ls> {#ramante yogino 'nante satyAnande cidAtmani . #}
[Page6-0343]
{# iti rAmapadenAsO paraM brahmABiDIyate ..#} 
<ls>WEBER, RĀMAT. UP. 286.</ls> 
<div n="3">— γ) = <is>Balarāma</is>, <is>Halāyudha</is>, ein älterer Bruder <is>Kṛṣṇa's</is> 
<ls>AK. 1, 1, 1, 18. 3, 4, 23, 143.</ls> 
<ls>H. 224.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> 
<ls>HALĀY. 1, 29.</ls> 
<ls>BHĀG. P. 1, 11, 17. 10, 1, 8.</ls> <ls>WEBER, KṚṢṆAJ. 268. 281. 284. 289.</ls> erscheint bei den <is>Jaina</is> unter den 
<ls>9</ls> {%weissen%} (s. oben u. 
<div n="1"> 1) <is>Bala's</is> 
<ls>H. 698.</ls> — <is>Rāma</is> unter den sieben Weisen eines <is>Manu</is> 
<ls>HARIV. 453.</ls> 
<ls>MĀRK. P. 80, 4.</ls> <is>Rāma</is> ist ein auch später häufig vorkommender Name: so heisst z. B. ein Sohn <is>Tārāvaloka's</is> und einer <is>Mādrī</is> und Zwillingsbruder <is>Lakṣmaṇa's</is> 
<ls>KATHĀS. 113, 32.</ls> verschiedene Lehrer, Autoren 
u.s.w. <ls>BURN. Intr. 567</ls> (neben {#Badanta°#}). 
<ls>COLEBR. Misc. Ess.</ls> <ls>?II,49. Verz. d. B. H. No. 109. 833. Ind. St.8,389. HALL 84. 119. Verz. d. Oxf. H. 126,b, No. 220. 129,b, No. 234. 148,a,9. 151,b, No. 321. fgg. 335,b, No. 788. 341,b, N. 358,a, No. 853. 386,a, No. 505.</ls> ein Fürst von <is>Mallapura</is> 
<ls>148,b,15. 18.</ls> von <is>Śṛṅgavera</is> 
<ls>165</ls>, {%a%}, 
<ls>7. 178</ls>, {%a%}, 
<ls>?16. - RĀJA-TAR. 8, 785. KṢITĪŚ. 10, 7. fgg.</ls> 
<div n="2">— d) Bein. <is>Varuṇa's</is> 
<ls>MED.</ls> 
<div n="2">— e) pl. N. pr. eines Volkes 
<ls>VP. 177.</ls> 
<div n="1">— 3) <lex>f.</lex> {#A#} 
<div n="2"> a) {%ein Weib niedriger Herkunft%}; s. u. 1). 
<div n="2">— b) = {#hiNgu#} {%Asa foetida%} 
<ls>H. an.</ls> <ls>MED.</ls> = {#hiNgula#} {%Mennig%} 
<ls>ŚABDAR.</ls> im <ls>ŚKDR.</ls> 
<div n="1">— 4) <lex>f.</lex> {#I#} {%Dunkel, Nacht%}: {#u\zA na rA\mIra^ru\RErapo^rRute#} 
<ls>ṚV. 2, 34, 11.</ls> 
<div n="1">— 5) <lex>n.</lex> 
<div n="2"> a) {%Dunkel%}: {#a\gnI ruSa^dBi\rvarRE^ra\Bi rA\mama^sTAt#} 
<ls>ṚV. 10, 3, 3.</ls> 
<div n="2">— b) = {#vAstuka#} ({%Chenopodium album%}) und {#kuzWa#} (in welcher Bed.?) 
<ls>H. an.</ls> <ls>MED.</ls> = {#tamAlapattra#} 
<ls>RĀJAN.</ls> im <ls>ŚKDR.</ls> 
<div n="v">— Vgl. {#aDo°, paraSu°, bala°, Badanta°, maRi°, manasA°#} .
<LEND>
drdhaval2785 commented 3 years ago

{#rAma/#} - Sanskrit text is shown between {# and #} tags. I think the requirement would be to convert it to Devanagari. This can be handled. <ls>HALĀY. 4, 49.</ls> - Reference in the printed book is in some old Anglicized Sanskrit format, which is now brought to IAST format. This would not require any change I guess.

Is this what you have in mind @Andhrabharati ?

Andhrabharati commented 3 years ago

I already finished doing whatever is needed in PWG, so I am not interested any more in this particular lexicon. https://github.com/sanskrit-lexicon/PWG/issues/39#issuecomment-888507043

Yes, no encodings in whatever manner, for any language; the tagging/marking could remain as is.

I would try "masking" my eyes on them; but this has a side-effect of skipping cases like wrong tagging/marking, which I had pointed during my MW work earlier!!

drdhaval2785 commented 3 years ago

https://github.com/sanskrit-lexicon/csl-devanagari/ this repository has 36 dictionaries from Cologne in Devanagari friendly manner. If there are any corrections needed, they may be tracked separately in that repository.