sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

skd.xml issues #349

Open drdhaval2785 opened 7 years ago

drdhaval2785 commented 7 years ago
<H1><h><key1>kuberaH</key1><key2>kube(ve)raH</key2></h><body><HI/><s>kube(ve)raH, puM, (kumbatIti . kuba i ki AcCAdane</s><lb/><s>“kumbernalopaSca” . uRAM 1 . 60 . iti erak .</s><lb/><s>nalopaSca . yadvA kutsitaM veraM SarIraM yasya . piNgala</s><lb/><s>netratvAttaTAtvam .) yakzarAjaH . iti sidDAnta-</s><lb/><s>kOmudyAmuRAdivfttiH .. (sa ca viSravasa fze</s><lb/><s>rilavilAyAM jAtaH . sa tu tripAt azwadantaH</s><lb/><s>kekarAkzaSca . yaTA, vAyupurARe .</s><lb/><s>“kutsAyAM kvitiSabdo'yaM SarIraM veramucyate .</s><lb/><s>kuveraH kuSarIratvAt nAmnA tenEva so'NkitaH” ..</s><lb/><s>taTA kASIKaRqe devIdattaSApoktO ca .</s><lb/><s>“kuvero Bava nAmnA tvaM mama rUperzyayA suta !” ..)</s></body><tail><L>8094</L><pc>2-144</pc></tail></H1>
<H1><h n="alt"><key1>kuveraH</key1><key2>kube(ve)raH</key2></h><body ref="8094"></body><tail><L>8094.01</L><pc>2-144</pc></tail></H1>

The second member is so oddly out of tune with the rest of entries that my parser gave abnormal results for babylon generation.

Can I request for duplication of data please ?

<H1><h><key1>kuberaH</key1><key2>kube(ve)raH</key2></h><body><HI/><s>kube(ve)raH, puM, (kumbatIti . kuba i ki AcCAdane</s><lb/><s>“kumbernalopaSca” . uRAM 1 . 60 . iti erak .</s><lb/><s>nalopaSca . yadvA kutsitaM veraM SarIraM yasya . piNgala</s><lb/><s>netratvAttaTAtvam .) yakzarAjaH . iti sidDAnta-</s><lb/><s>kOmudyAmuRAdivfttiH .. (sa ca viSravasa fze</s><lb/><s>rilavilAyAM jAtaH . sa tu tripAt azwadantaH</s><lb/><s>kekarAkzaSca . yaTA, vAyupurARe .</s><lb/><s>“kutsAyAM kvitiSabdo'yaM SarIraM veramucyate .</s><lb/><s>kuveraH kuSarIratvAt nAmnA tenEva so'NkitaH” ..</s><lb/><s>taTA kASIKaRqe devIdattaSApoktO ca .</s><lb/><s>“kuvero Bava nAmnA tvaM mama rUperzyayA suta !” ..)</s></body><tail><L>8094</L><pc>2-144</pc></tail></H1>
<H1><h n="alt"><key1>kuveraH</key1><key2>kube(ve)raH</key2></h><body><HI/><s>kube(ve)raH, puM, (kumbatIti . kuba i ki AcCAdane</s><lb/><s>“kumbernalopaSca” . uRAM 1 . 60 . iti erak .</s><lb/><s>nalopaSca . yadvA kutsitaM veraM SarIraM yasya . piNgala</s><lb/><s>netratvAttaTAtvam .) yakzarAjaH . iti sidDAnta-</s><lb/><s>kOmudyAmuRAdivfttiH .. (sa ca viSravasa fze</s><lb/><s>rilavilAyAM jAtaH . sa tu tripAt azwadantaH</s><lb/><s>kekarAkzaSca . yaTA, vAyupurARe .</s><lb/><s>“kutsAyAM kvitiSabdo'yaM SarIraM veramucyate .</s><lb/><s>kuveraH kuSarIratvAt nAmnA tenEva so'NkitaH” ..</s><lb/><s>taTA kASIKaRqe devIdattaSApoktO ca .</s><lb/><s>“kuvero Bava nAmnA tvaM mama rUperzyayA suta !” ..)</s></body><tail><L>8094</L><pc>2-144</pc></tail></H1>

This will keep things uniform for other users of XML.

drdhaval2785 commented 7 years ago

@funderburkjim Easily doable I guess.

gasyoun commented 7 years ago

This will keep things uniform for other users of XML.

And not increase drastically file size?

funderburkjim commented 7 years ago

Hmm.

I guess what you are wanting is to explicitly have the same 'body' for the 2nd element; rather than a body implicitly implied by the ref="8094" attribute.

If this were done, we still need to add the information of that 'ref' attribute somewhere in the 2nd xml record. I suppose we could put this in the <tail> element, maybe as

<ref type="alt" n="8094"/>.

Corresponding change to disp.php also might be desireable, so the alternate headword nature of 'kuveraH' would be displayed: image

It is actually a complication of the display (disp.php) to generate the phrase

(kuveraH is alternate of kuberaH)

because the program currently has to make a separate call to the database to find the key1 implied by L=8094.

This cross-referencing would not be needed by disp.php if we made a more robust <ref> tag, which also included the key1 of the target L=8094:

<ref type="alt" n="8094" key1="kuberaH"/>.

So, after thinking out loud on this, one solution would be to change make_xml.py so that the record generated for kuveraH is:


<H1><h n="alt"><key1>kuveraH</key1><key2>kube(ve)raH</key2></h>
<body><HI/><s>kube(ve)raH, puM, (kumbatIti . kuba i ki AcCAdane</s><lb/><s>“kumbernalopaSca” . uRAM 1 . 60 . iti erak .</s><lb/><s>nalopaSca . yadvA kutsitaM veraM SarIraM yasya . piNgala</s><lb/><s>netratvAttaTAtvam .) yakzarAjaH . iti sidDAnta-</s><lb/><s>kOmudyAmuRAdivfttiH .. (sa ca viSravasa fze</s><lb/><s>rilavilAyAM jAtaH . sa tu tripAt azwadantaH</s><lb/><s>kekarAkzaSca . yaTA, vAyupurARe .</s><lb/><s>“kutsAyAM kvitiSabdo'yaM SarIraM veramucyate .</s><lb/><s>kuveraH kuSarIratvAt nAmnA tenEva so'NkitaH” ..</s><lb/><s>taTA kASIKaRqe devIdattaSApoktO ca .</s><lb/><s>“kuvero Bava nAmnA tvaM mama rUperzyayA suta !” ..)</s>
</body>
<tail><L>8094.01</L><pc>2-144</pc><ref type="alt" n="8094" key1="kuberaH"/></tail></H1>
funderburkjim commented 7 years ago

I doubt if the file size difference of skd.xml would be an issue. There are about 330 alternate headwords out of a total 42000 headwords for skd.

funderburkjim commented 7 years ago

tail wagging dog?

The motivation for this issue was to make the stardict parsing process easier.

But alleviating that problem should probably not be the primary consideration regarding construction of skd.xml for alternate headwords.

Another solution might be to have the skd.xml record for kuveraH as follows:

<H1><h n="alt"><key1>kuveraH</key1><key2>kube(ve)raH</key2></h>
<body>See <s>kuberaH</s>.</body>
<tail><L>8094.01</L><pc>2-144</pc><ref type="alt" n="8094" key1="kuberaH"/></tail></H1>

Here, the body is perfectly regular, and probably conveys all that needs to be conveyed for 'kuveraH', namely 'See kuberaH'.

The more full information is present in the <ref> tag; and could be either (a) used by some downstream program (disp.php or stardict parser, for instance) to insert the full body of the referent (kuberaH) (b) Or the <ref> tag could just be ignored by a downstream program.