Open Andhrabharati opened 3 years ago
I will also be proofing the text at the end; as it is just 442pp. should not take much time "to do".
I welcome your offer. We do not want to match printed text and digitized text. We want digitized text in a consistent transliteration scheme. There were many ways in which every author printed Anglicized Sanskrit. That was a nightmare.
After a lot of efforts, all dictionaries were modified go use only IAST for Anglicized Sanskrit and SLP1 for Devanagari.
If by 'matching with printed book' you mean to go back to non-IAST days, it would be a retrogression for sure. If you mean to conform to modern IAST standard, by all means it is a welcome step.
It is not going back, but going forward only, I guess.
Recall your marking some Dravidian letters with ISO characters, which are not in IAST at all. And you had left the l13 places in the process.
I rendered only these with the diacs as per book. You may just go through the transliteration page sent.
For Sanskrit letters, I am not against IAST and it will sure be continued for more time to come.
I would also be incorporating the ~100 addenda entries into the main text, and also attempting on "Misprints that may be more or less easily corrected by the readers" as mentioned by Sircar.
Please do so. Will be happy to incorporare.
N. B.--Misprints that may be more or less easily corrected by the readers include (1) a few cases of ṛ written as ṛi (e. g., p. 388-- bhṛta, p. 393-- tṛṇa) and cha written for ca (p. 324, line 3); (2) wrong use of capital and small letters at the beginning of entries meant for indicating persons and objects respectively; (3) entries put away from their proper places (e. g., p. 10-- agahara, p. 49-- bhamāti and Bhāṇaka, p. 211-- naṅga and Nāṇī, p. 257-- prāstarika-śreṇī, p. 412-- aradu dogarāca-ppaṇṇu, p. 433-- jīrṇa-viśvamalla-priya), and (4) occasional omission of diacritical marks in ā, ĕ, ŏ, ḍ, ṇ, ś, ṭ, etc.
Out of these, (3) does not affect the digital searching and only the other three types are to be corrected appropriately.
Most of the work is done in the Main text
<P>
lines as well, they being with a different sense/meaning as per another "source"Here is the file so far done- ieg_Main.txt
Hope this style of the text is "acceptable" to the CDSL team.
Instead of doing complete proof as initially thought of, resorted to just HW proofing for now. And this would be over in just about 2-3 days' time. [As the metaline is with SLP1 encoding, not touching it; one can programmatically identify the differences between the entry (in Unicode Roman) and the k2 field, and appropriate corrections in k1 & k2 could be done at the end.]
@funderburkjim As you are now looking at this repo now, how would you suggest marking the "revised" lines, while incorporating the addenda matter in the main text? [There are just about 100 lines in Addenda.] ieg_Addenda.txt
I looked at your ieg_Main.txt file @Andhrabharati .
It seems that you removed all line breaks.
This creates a problem in programmatic comparision of both files.
Do you have a version where you have kept this line break intact ?
<div n="lb">
Unfortunately NO, @drdhaval2785!
This is the best possible version (recreated now) to compare my version with MELD (or through some program), excluding the "div tag stings". ieg_Main (L0).txt
And there are quite many changes I did in my file (some more than earlier the listed ones above), apart from removing the line breaks.
If nothing else, you might consider just using (1) the HWs portion, (2) ab tags and (3) ls tags from my file to incorporate appropriately in Cologne style (whatever it is). [My file(s) can act as a guide for possible changes/improvements, as Jim is supposedly using my AP90.]
@Andhrabharati ,
ādheyaṃ -> ādheyaṁ
Is this intentional change or unintentional?
According to IAST specification https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration ṃ
is the correct IAST one.
It is intentional, to facilitate character to character comparison (using just eye, not the brain!), as I was proofing with the print text. I thought of changing it back to IAST style at the end.
Of course, I have used the brain as well to correct the cases (1) and (4) of Sircar [as mentioned above].
OK. In that case, it would be wise for me to stop incorporating changes right now. Once you are through with your comparision and have made necessary changes, I will incorporate the changes. It will be a non-trivial activity, though.
Sure; most probably by tomorrow, I will be done with my (present) work on this.
@funderburkjim As you are now looking at this repo now, how would you suggest marking the "revised" lines, while incorporating the addenda matter in the main text? [There are just about 100 lines in Addenda.]
BTW, I've decided to incorporate the Addenda entries into the main text, by putting a comment (;) line at the end (after<LEND>
) of the entry.
This may be kept in mind to change/mark it in some other manner (if Jim suggests any).
occasional omission of diacritical marks in ā, ĕ, ŏ, ḍ, ṇ, ś, ṭ
Eagle-eye.
This may be kept in mind to change/mark it in some other manner (if Jim suggests any).
Your proposal is better than no proposal at all.
Seems it is time for me to post my IEG work now. ieg_Main.txt ieg_Appendices.txt
In addition to HWs proofing, I did some error corrections in internal text also, though very sparingly.
The points briefed earlier (as above) may be kept in mind while looking into these files. [Esp. attention is drawn to my comment lines starting with ';'.]
One addl. point is that, if the Grouped (G) and Dual (D) entries are expanded, quite many repeats would come into picture. Should these be left as is, as the L-numbers would be different, or should they be marked with [1], [2] as done in some other dictionaries?
Also if the transliteration I used is accepted, @funderburkjim might've to re-do his 'ea' work on this IEG.
Incidentally, the l̤̣ is properly rendered only by few fonts (like Charis, Noto, Siddhanta1 etc.; the Old Standard Indologique font used by Cologne doesn't support this!). And this letter if rendered in Devanagari could be ऴ, as seen in some books.
These are the characters I got in these two files-
á: 1 d̤: 123 ě: 268 ï: 3 l̤: 411 l̤̣: 87 n̤: 51 ŏ: 38 r̤: 246 s̤: 18 t̤: 10
Also I would like to mention one point-
Though the transliteration scheme by Sircar shows the short e as ĕ (u+0115), it is completely rendered as ě (u+011B) inside the pages; probably due to the Greek words having this ě letter. [Greek orthography seems to be distinguishing between these two forms.]
Now, there are 370 entries marked with $, indicating non-SLP1 transliteration; they all come under non-Sanskrit category.
Old Standard Indologique font used by Cologne doesn't support this
Could be fixed, if I would be aware what else is missing there.
@drdhaval2785
I just happened to land here onto IEG, for some reason.
Seen that I can do quite an amount of work in this.
To start with, I had split the file into 4 parts (FrontPages, Main, Index and Addenda). ieg_FrontPages.txt ieg_Main.txt ieg_Index.txt ieg_Addenda.txt
And corrected the transliteration page in the front pages to match with the print book. [Apart from the transliterations, there are few typo errors in this page in Cologne text.] IEG Transliteration.txt
Should I continue (and post my work here, on this IEG)?