sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.
14 stars 10 forks source link

MW '-' v/s '—' #1059

Open drdhaval2785 opened 1 year ago

drdhaval2785 commented 1 year ago

Data

<L>27<pc>1,1<k1>aMSaBUta<k2>aMSa—BUta<e>3
<s>aMSa—BUta</s> ¦ <lex>mfn.</lex> forming part of.<info lex="m:f:n"/>
<LEND>
<L>27.1<pc>1308,1<k1>aMSarUpiRI<k2>aMSa-rUpiRI<e>3
<s>aMSa-rUpiRI</s> ¦ <lex>f.</lex> (with <s>Sakti</s>) a female personification of the divine energy, <ls>RTL. 187</ls>.<info n="sup"/><info lex="f"/>
<LEND>
<L>28<pc>1,1<k1>aMSavat<k2>aMSa—vat<e>3
<s>aMSa—vat</s> ¦ (for <s>aMSumat</s>?) <lex>m.</lex> a species of <s1 slp1="soma">Soma</s1> plant, <ls>Suśr.</ls><info lex="m"/>
<LEND>

Problem

See in aMSa—BUta and - in aMSa-rUpiRI

It should be consistent. @funderburkjim, is there any programmatic reason why this may have happenned?

funderburkjim commented 1 year ago

At some point in the past, I introduced the two types of hyphen in compounds.
My current opinion is that either

In the particular case mentioned, aMSa-BUta is from the main text, and aMSa-rUpiRI from the supplement.

I would currently opt for changing the long-hyphen to a normal hyphen.

When interest is sufficient, there could be an attempt to devise a scheme for representing the parsed form of compounds.

Andhrabharati commented 1 year ago

In fact, my next issue is about this!!

It is flawed (implementation) in multiple ways and SHOULD be changed to a normal hyphen, when the full forms are 'made' in the digitised text.

Andhrabharati commented 1 year ago

I could make up some time today to post the intended issue at https://github.com/sanskrit-lexicon/mw-dev/issues/23#issue-1588019160