sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.

14 stars 10 forks source link

MW '-' v/s '—' #1059

Open drdhaval2785 opened 1 year ago

drdhaval2785 commented 1 year ago

Data

<L>27<pc>1,1<k1>aMSaBUta<k2>aMSa—BUta<e>3
<s>aMSa—BUta</s> ¦ <lex>mfn.</lex> forming part of.<info lex="m:f:n"/>
<LEND>
<L>27.1<pc>1308,1<k1>aMSarUpiRI<k2>aMSa-rUpiRI<e>3
<s>aMSa-rUpiRI</s> ¦ <lex>f.</lex> (with <s>Sakti</s>) a female personification of the divine energy, <ls>RTL. 187</ls>.<info n="sup"/><info lex="f"/>
<LEND>
<L>28<pc>1,1<k1>aMSavat<k2>aMSa—vat<e>3
<s>aMSa—vat</s> ¦ (for <s>aMSumat</s>?) <lex>m.</lex> a species of <s1 slp1="soma">Soma</s1> plant, <ls>Suśr.</ls><info lex="m"/>
<LEND>

Problem

See — in aMSa—BUta and - in aMSa-rUpiRI

It should be consistent. @funderburkjim, is there any programmatic reason why this may have happenned?

funderburkjim commented 1 year ago

At some point in the past, I introduced the two types of hyphen in compounds.
My current opinion is that either

there is no useful distinction to be made by using two types of hyphen, or
there is a useful distinction that could be made, but the implementation of this distinction is currently flawed.

In the particular case mentioned, aMSa-BUta is from the main text, and aMSa-rUpiRI from the supplement.

I would currently opt for changing the long-hyphen to a normal hyphen.

When interest is sufficient, there could be an attempt to devise a scheme for representing the parsed form of compounds.

Andhrabharati commented 1 year ago

In fact, my next issue is about this!!

It is flawed (implementation) in multiple ways and SHOULD be changed to a normal hyphen, when the full forms are 'made' in the digitised text.

Andhrabharati commented 1 year ago

I could make up some time today to post the intended issue at https://github.com/sanskrit-lexicon/mw-dev/issues/23#issue-1588019160