sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

STC Ç->Ś and ç->ś #430

Closed drdhaval2785 closed 5 years ago

drdhaval2785 commented 5 years ago

Triggered by the suggestion of Caujolle in correction form. It seems a major change affecting 1146 lines in stc5.txt.

Case 24367: 02/20/2019 dict=STC, L= ID=19980, hw=zaka, user=Caujolle
old = ère Çaka (78 de l'ère).
new = ère Śaka (78 de l'ère).
comment = for coherence, the Ç should here be replaced by Ś
gasyoun commented 5 years ago

for coherence, the Ç should here be replaced by Ś

As per book?

funderburkjim commented 5 years ago

Stchoupak used c-cedilla for palatal sibilant in his version of IAST. But since c-cedilla is also used in French words, it requires extra work to change ç->ś in Sanskrit words, without introducing mis-spelling of French words.

In other words, what is needed is a list of all c-cedilla words in Stc, then a separation of these into Sanskrit and French words, and finally a set of corrections ç->ś for the Sanskrit words.

The separation into Sanskrit and French words is the labor intensive step. I've marked this as a bug so we can revisit. Sampada may be able to help, since she knows both French and Sanskrit.

drdhaval2785 commented 5 years ago

Just to help Sampada, the following has been done. (code in pywork/correctionwork/issue-430 folder in Cologne).

words.txt - all words with c-cedilla sanskrit.txt - words identified as Sanskrit (because the word in slp1 transliteration scheme is in sanhw1.txt headwords having > 2 letters), in manualByLine.txt format. french.txt - words identified by pyenchant package as french conclusively. doubt.txt - Words which require closer examination whether they need correction or not i.e. neither Sanskrit nor French conclusively.

doubt.txt french.txt sanskrit.txt words.txt

EDIT - changed the code and updated the files. Now only 32 cases are pending in the doubt.txt out of 1129 cases.

sanskrit.txt has this criterion of inclusion -slp1 of the word is in the words of sanhw1 having > 2 letters.

drdhaval2785 commented 5 years ago

Remaining 32 cases are as follows

; intransperçable
16326 {@a-bhedya-@}¦ a. v. à ne pas dénoncer; inséparable <lbinfo n="7"/>; intransperçable.
; çivaïte
21510 {@a-vimukta-@}¦ m. Śiva; n. d'un Tīrtha.  {%°śaiva-%} m. moine çivaïte de
; imperçable
21913 {@a-vedhya-@}¦ a. v. imperçable.
; ç
35029 ç.
; Çivaïtes
38981 secte particulière de Çivaïtes ; ép. de Śiva ; un des onze Rudra ; {%-ī-%}
; Ç
83823 {@pāśu-pata-@}¦ {%-ī-%} a. consacré à Śiva Paçupati, relatif à Ç. P. ;
; Ç
83824 m. adorateur de Ç. P. ; nt. doctrine des adorateurs de Ç. P.
; Ç
83824 m. adorateur de Ç. P. ; nt. doctrine des adorateurs de Ç. P.
; Ç
83826 <P>{%°vrata-%} nt. système des doctrines des adorateurs <lbinfo n="5"/> de Ç. P.
; Ç
90237 Ç.), convaincu de, décidé à ; satisfait <lbinfo n="4"/>, réjoui ; m. n. d'une divinité
; çivaïtes
100388 çivaïtes.
; Ç
132583 <P>{%śatānanda-%} m. n. de divers personnages, dont Ç. Gautama, prêtre de la
; çivaïte
133869 secte çivaïte.
; Ç
133968 <P>{%°gotra-%} nt. la famille de Ç.
; Ç
134255 {@śāmbhava-@}¦ {%-ī-%} a. relatif à Śiva, propre à Ç., consacré à Ç.;
; Ç
134255 {@śāmbhava-@}¦ {%-ī-%} a. relatif à Śiva, propre à Ç., consacré à Ç.;
; Ç
134359 <P>{%°pakṣin-%} m. l'oiseau Ç.
; Ç
134379 {@śārva-@}¦ {%-ī-%} a. relatif à Śiva, consacré à Ç., propre à Ç.
; Ç
134379 {@śārva-@}¦ {%-ī-%} a. relatif à Śiva, consacré à Ç., propre à Ç.
; Ç
134511 rois; {%-ka- -ikā-%} a. relatif aux Ç., appartenant aux Ç., qui gouverne <lbinfo n="5"/> sur
; Ç
134511 rois; {%-ka- -ikā-%} a. relatif aux Ç., appartenant aux Ç., qui gouverne <lbinfo n="5"/> sur
; Ç
134512 les Ç.
; Ç
134514 <P>{%°rāja- -an-%} m. roi des Ç.
; Çivaïte
136535 dérivé de Śiva; m. adorateur de Śiva, Çivaïte; nt. bien-être, bonheur.
; Ç
136842 langue des Ç., n. d'un prâkrit dramatique; {%-ikā-%} f. id.
; Ç
137215 <P>{%°bhuj-%} ag. qui a mangé de la nourriture préparée pour un Ç.
; Ç
137217 <P>{%°mitra-%} a. qui contracte des amitiés à l'occasion <lbinfo n="4"/> du Ç.
; Ç
137221 <P>{%śrāddhāha-%} m. jour où l'on célèbre un Ç.
; Çivaïtes
140178 (chez les Çivaïtes) âme du plus bas degré.
; perçable
157616 <P>{%°bhedya-%} a. v. (perçable avec une aiguille) très dense, massif.

Caujolle or Sampada may have a look at them and let us know whether there is any need to change / error in the said lines.

drdhaval2785 commented 5 years ago

Response from Caujolle regarding doubtful cases is attached herewith

FRANÇAIS means French in French

Çivaïte means shaiva in French, so it should be replaced by śaiva or Śaiva, depending on the case

Perçable, imperçable and intransperçable are French words

Paçupati > Paśupati

All cases of Ç. should be replaced by Ś.

I am a bit reserved regarding whether to change Çivaïte to śaiva or Śaiva. I vote to keep it as it is. What is your take @funderburkjim and @gasyoun? Rest resolutions seem fine.

drdhaval2785 commented 5 years ago

Goth the french word list examined by Caujolle and all are french words. So now the changes are ready to be installed.

gasyoun commented 5 years ago

I am a bit reserved regarding whether to change Çivaïte to śaiva or Śaiva. I vote to keep it as it is.

I guess she did not meant to change it. Guess she meant Çivaïte to Śivaïte?

french word list examined by Caujolle and all are french words. So now the changes are ready to be installed.

Well done, Dhaval. If there is you, there is still hope Cologne will not be doomed.

SergeA commented 5 years ago

"Çivaïte" is derived from "Çiva" (शिव). If the spelling "Çiva" is replaced by "Śiva", then for the sake of consistency "Çivaïte" also needs the same correction. So it will be "śivaïte". (I don't know if here the first letter must be capital or lowercase in French.) The same is applicable for vishnouïte > viṣṇuïte and other such cases of derived French words. Perhaps such half-Sanskrit words need a special markup?

SergeA commented 5 years ago

whether to change Çivaïte to śaiva or Śaiva <

Definitely not.

drdhaval2785 commented 5 years ago

I vote for keeping half-Sanskrit words as they are currently i.e. treat them as French. Only Sanskrit words get converted to Ś from c-cedilla.

SergeA commented 5 years ago

In MW we have also such half-Sanskrit words, e.g. Āraṇyakas with Sanskrit stem and English ending -s. BTW, I tried to find in advanced MW "Āraṇyakas" in text, but it gives nothing. Perhaps there is some search/markup conflict here.

funderburkjim commented 5 years ago

I vote for Çivaïte -> Śivaïte.

Here are the two instances related to @SergeA's example:

100387:{@bhākta-@}¦ m. adepte de la Bhakti, homme pieux ; pl. sectes vishnouïtes et
100417:<P>{%°purāṇa-%} nt. titre d'un Purāṇa d'inspiration vishnouïte.

Would use viṣṇuïte (lower-case since vishnouïte is lower).

funderburkjim commented 5 years ago

Āraṇyakas not found in MW Advanced Search

In the Digitization, Āraṇyakas appears with markup:

<s1 slp1="AraRyaka">Āraṇyaka</s1>s

The advanced search uses an extraction (webtc2/query_dump.txt) of the digitization. The extraction program (webtc2/ currently turns <s1 slp1="AraRyaka">Āraṇyaka</s1>s into āraṇyaka s. This is why a substring search for 'Āraṇyaka' will find Āraṇyakas but a search for the plural form Āraṇyakas finds no matches.

drdhaval2785 commented 5 years ago

Installation completed.