sanskrit-lexicon / csl-devanagari

Convert SLP1 data from csl-orig into Devanagari for easy proofreading.
0 stars 1 forks source link

IEG Study #31

Open Andhrabharati opened 3 years ago

Andhrabharati commented 3 years ago

@drdhaval2785

I just happened to land here onto IEG, for some reason.

Seen that I can do quite an amount of work in this.

To start with, I had split the file into 4 parts (FrontPages, Main, Index and Addenda). ieg_FrontPages.txt ieg_Main.txt ieg_Index.txt ieg_Addenda.txt

And corrected the transliteration page in the front pages to match with the print book. [Apart from the transliterations, there are few typo errors in this page in Cologne text.] IEG Transliteration.txt

Should I continue (and post my work here, on this IEG)?

Andhrabharati commented 3 years ago

I will also be proofing the text at the end; as it is just 442pp. should not take much time "to do".

drdhaval2785 commented 3 years ago

I welcome your offer. We do not want to match printed text and digitized text. We want digitized text in a consistent transliteration scheme. There were many ways in which every author printed Anglicized Sanskrit. That was a nightmare.

After a lot of efforts, all dictionaries were modified go use only IAST for Anglicized Sanskrit and SLP1 for Devanagari.

If by 'matching with printed book' you mean to go back to non-IAST days, it would be a retrogression for sure. If you mean to conform to modern IAST standard, by all means it is a welcome step.

Andhrabharati commented 3 years ago

It is not going back, but going forward only, I guess.

Recall your marking some Dravidian letters with ISO characters, which are not in IAST at all. And you had left the l13 places in the process.

I rendered only these with the diacs as per book. You may just go through the transliteration page sent.

Andhrabharati commented 3 years ago

For Sanskrit letters, I am not against IAST and it will sure be continued for more time to come.

Andhrabharati commented 3 years ago

I would also be incorporating the ~100 addenda entries into the main text, and also attempting on "Misprints that may be more or less easily corrected by the readers" as mentioned by Sircar.

drdhaval2785 commented 3 years ago

Please do so. Will be happy to incorporare.

Andhrabharati commented 3 years ago

Addenda et Corrigenda

N. B.--Misprints that may be more or less easily corrected by the readers include (1) a few cases of ṛ written as ṛi (e. g., p. 388-- bhṛta, p. 393-- tṛṇa) and cha written for ca (p. 324, line 3); (2) wrong use of capital and small letters at the beginning of entries meant for indicating persons and objects respectively; (3) entries put away from their proper places (e. g., p. 10-- agahara, p. 49-- bhamāti and Bhāṇaka, p. 211-- naṅga and Nāṇī, p. 257-- prāstarika-śreṇī, p. 412-- aradu dogarāca-ppaṇṇu, p. 433-- jīrṇa-viśvamalla-priya), and (4) occasional omission of diacritical marks in ā, ĕ, ŏ, ḍ, ṇ, ś, ṭ, etc.

Out of these, (3) does not affect the digital searching and only the other three types are to be corrected appropriately.

Andhrabharati commented 3 years ago

Most of the work is done in the Main text

Here is the file so far done- ieg_Main.txt

Hope this style of the text is "acceptable" to the CDSL team.

Instead of doing complete proof as initially thought of, resorted to just HW proofing for now. And this would be over in just about 2-3 days' time. [As the metaline is with SLP1 encoding, not touching it; one can programmatically identify the differences between the entry (in Unicode Roman) and the k2 field, and appropriate corrections in k1 & k2 could be done at the end.]

@funderburkjim As you are now looking at this repo now, how would you suggest marking the "revised" lines, while incorporating the addenda matter in the main text? [There are just about 100 lines in Addenda.] ieg_Addenda.txt

drdhaval2785 commented 3 years ago

I looked at your ieg_Main.txt file @Andhrabharati . It seems that you removed all line breaks. This creates a problem in programmatic comparision of both files. Do you have a version where you have kept this line break intact ? <div n="lb">

Andhrabharati commented 3 years ago

Unfortunately NO, @drdhaval2785!

This is the best possible version (recreated now) to compare my version with MELD (or through some program), excluding the "div tag stings". ieg_Main (L0).txt

And there are quite many changes I did in my file (some more than earlier the listed ones above), apart from removing the line breaks.

If nothing else, you might consider just using (1) the HWs portion, (2) ab tags and (3) ls tags from my file to incorporate appropriately in Cologne style (whatever it is). [My file(s) can act as a guide for possible changes/improvements, as Jim is supposedly using my AP90.]

drdhaval2785 commented 3 years ago

@Andhrabharati , ādheyaṃ -> ādheyaṁ Is this intentional change or unintentional? According to IAST specification https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration is the correct IAST one.

Andhrabharati commented 3 years ago

It is intentional, to facilitate character to character comparison (using just eye, not the brain!), as I was proofing with the print text. I thought of changing it back to IAST style at the end.

Of course, I have used the brain as well to correct the cases (1) and (4) of Sircar [as mentioned above].

drdhaval2785 commented 3 years ago

OK. In that case, it would be wise for me to stop incorporating changes right now. Once you are through with your comparision and have made necessary changes, I will incorporate the changes. It will be a non-trivial activity, though.

Andhrabharati commented 3 years ago

Sure; most probably by tomorrow, I will be done with my (present) work on this.

Andhrabharati commented 3 years ago

@funderburkjim As you are now looking at this repo now, how would you suggest marking the "revised" lines, while incorporating the addenda matter in the main text? [There are just about 100 lines in Addenda.]

BTW, I've decided to incorporate the Addenda entries into the main text, by putting a comment (;) line at the end (after<LEND>) of the entry.

This may be kept in mind to change/mark it in some other manner (if Jim suggests any).

gasyoun commented 3 years ago

occasional omission of diacritical marks in ā, ĕ, ŏ, ḍ, ṇ, ś, ṭ

Eagle-eye.

This may be kept in mind to change/mark it in some other manner (if Jim suggests any).

Your proposal is better than no proposal at all.

Andhrabharati commented 3 years ago

Seems it is time for me to post my IEG work now. ieg_Main.txt ieg_Appendices.txt

In addition to HWs proofing, I did some error corrections in internal text also, though very sparingly.

The points briefed earlier (as above) may be kept in mind while looking into these files. [Esp. attention is drawn to my comment lines starting with ';'.]

One addl. point is that, if the Grouped (G) and Dual (D) entries are expanded, quite many repeats would come into picture. Should these be left as is, as the L-numbers would be different, or should they be marked with [1], [2] as done in some other dictionaries?

Also if the transliteration I used is accepted, @funderburkjim might've to re-do his 'ea' work on this IEG.

Incidentally, the l̤̣ is properly rendered only by few fonts (like Charis, Noto, Siddhanta1 etc.; the Old Standard Indologique font used by Cologne doesn't support this!). And this letter if rendered in Devanagari could be ऴ, as seen in some books.

Andhrabharati commented 3 years ago

These are the characters I got in these two files-

á: 1 d̤: 123 ě: 268 ï: 3 l̤: 411 l̤̣: 87 n̤: 51 ŏ: 38 r̤: 246 s̤: 18 t̤: 10

Andhrabharati commented 3 years ago

Also I would like to mention one point-

Though the transliteration scheme by Sircar shows the short e as ĕ (u+0115), it is completely rendered as ě (u+011B) inside the pages; probably due to the Greek words having this ě letter. [Greek orthography seems to be distinguishing between these two forms.]

Andhrabharati commented 3 years ago

Now, there are 370 entries marked with $, indicating non-SLP1 transliteration; they all come under non-Sanskrit category.

gasyoun commented 3 years ago

Old Standard Indologique font used by Cologne doesn't support this

Could be fixed, if I would be aware what else is missing there.