sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.
14 stars 10 forks source link

Capitalization in Proper Names (earlier was mw:179760) #1537

Open drdhaval2785 opened 9 months ago

drdhaval2785 commented 9 months ago

date: 12/22/2023 00:33:17 dict: mw Lnum: 179760 hw: rocana old: (ruci-ruce r°) new: (Ruci-ruce r°) comm: Typo

drdhaval2785 commented 9 months ago

Requires examination and comparision with other such quotes.

Andhrabharati commented 9 months ago

This belongs to the category of initial-CAP letter of IAST [denoting proper nouns, as per English grammar] getting converted to small letter in slp1; there are hundreds (if not thousands) of such places across many cdsl texts.

It calls for a strategic decision, if something like what @gasyoun had proposed long ago to be resorted to [i.e., to use the slp1 notation {X}, for such initial-CAP letters in IAST; thus helping the 'invertibility' property that Jim and Dhaval mention time and again], to match with the printed text; or to ignore those initial-CAPs as is being done till now.

funderburkjim commented 9 months ago
current coding in179760 rocana
<s>ruci-ruce r°</s>    In cdsl displays, this is rendered according to the user's output preference.

Compare 179772 rocanA
<s1 slp1="SAlmali">Śālmali</s1>   Always rendered as IAST Śālmali in current cdsl displays.

The cdsl transcoding routines do not implement the {} feature of slp1.

<s1 slp1="ruci-ruce r°">ruci-ruce r°</s1>
(It so happens in ruci-ruce  that this is same in both slp1 and iast.)

And in print (mw-iast), there is no capitalization.  

If we decide that a print change should be made, the coding could be
<s1 slp1="ruci-ruce r°">Ruci-ruce r°</s1>
gasyoun commented 9 months ago

@Andhrabharati eagle-eyed you remain. Was not aware that there are hundreds (if not thousands) of such places across many cdsl texts. so many of them.

Andhrabharati commented 9 months ago
ruci-ruce r°

(It so happens in ruci-ruce that this is same in both slp1 and iast.)

And in print (mw-iast), there is no capitalization.

Wonder how this initial-CAP letter skipped Jim's eye--

image

It is not a print-change here!!

And I am referring to the cases like image image where proper nouns (names) are the 'entries'.

Compare 179772 rocanA

Śālmali Always rendered as IAST Śālmali in current cdsl displays.

image

I am aware of the <s1 slp1= notation of CDSL for the words 'inside' the body portion, but that is altogether a different matter.

Finally, it may be recalled that MW print has all the <H2> entry words rendered with initial-CAPs, but they need not be considered as initial-CAPs, unless they are denoting proper nouns.

Andhrabharati commented 9 months ago

@gasyoun

You had started the topic when you were 10 years younger.

If you make a step forward (by showing the result of what you said "you're ready to do"), probably Jim might not mind 'adapting' the transcoder files as he mentioned those days.

funderburkjim commented 9 months ago

Wonder how this initial-CAP letter skipped Jim's eye-

Jim must not have looked at the scans!

Andhrabharati commented 9 months ago

Wonder how this initial-CAP letter skipped Jim's eye-

Jim must not have looked at the scans!

Without looking at the print matter, how could you say thus--

And in print (mw-iast), there is no capitalization.

funderburkjim commented 9 months ago

'adapting' the transcoder files

Someone (maybe @artanat ?) needs to fill the role of transcoding expert at cdsl.

Peter's site https://sanskritlibrary.org/transcodeText.html provides an implementation of the {x} feature. See screen-shot of next comment. I don't know where or if Peter has documented the {x} feature specification.

Peter's site is based on Ralph Bunker's Java code, and on xml transcoding file formats devised by Malcolm, Peter, and Ralph.

The cdsl PHP and Python implementations (including the format of the input transcoding xml files) were made by me based on Bunker's early work. Ralph and Peter later revised their system, so the {x} feature (and probably other features) are not present in the cdsl system.

funderburkjim commented 9 months ago

Example of {x} feature at sanskritlibrary

image