Andhrabharati coding of AP90

funderburkjim commented 3 years ago

@Andhrabharati has made a complete version of AP90 digitization. This issue devoted to comments and further discussion.

Here is link to his documentation within another issue: https://github.com/sanskrit-lexicon/AP90/issues/15#issuecomment-845119956

Small sample from is AP90_.v.1.for.Cologne.txt file (link for download in comment above).

00001 (0001,a) <p> अ
     The first letter of the Nāgarī Alphabet.

00002 (0001,a) <p> अः [अवति, अतति सातत्वेन तिष्ठतीति वा; अव्-अत् वा, ड <Tv.>]
     1. {N.} of Vishṇu, the first of the three sounds constituting the sacred syllable ओम्; अकारो विष्णुरुद्दिष्ट उकारस्तु महेश्वरः । मकारस्तु स्मृतो ब्रह्मा प्रणवस्तु त्रयात्मकः ॥; for more explanation of the three syllables अ, उ, म्; see ओम्. −2. {N.} of Śiva, Brahmā, Vāyu, or Vaiśvānara.

Hats off to you, @Andhrabharati ! I'm sure your version will prove useful in numerous ways to Cologne's efforts with Apte's dictionary.

Andhrabharati commented 3 years ago

I had corrected the entries where cap. letter followed a small letter, or <i> is in the middle of a word. One entry of the 2nd type remained, and needs to be changed. Under 17482 (351A,c) <+> −शृंगः,Rishya<i>śṛinga</i> to be changed as<i>Ṛishyaśṛinga</i>.

Also some entries (8 or 9) having double −3. (instead of −2. and −3.) are to be corrected.

There are more (26) entries where there is no −2., but −3. onwards are present.

Andhrabharati commented 3 years ago

{Hind.} missed in the abbr. file

6 occurances of Zend. to be corrected without a dot in the data file, as the other 25 Zend are.

Andhrabharati commented 3 years ago

Guess @funderburkjim should be able to expand all the abbr. and ls entries from my file(s).

I would be glad to fill the gaps, if any.

funderburkjim commented 3 years ago

I've taken the liberty to upload Andhrabharati's files to this repository: https://github.com/sanskrit-lexicon/AP90/tree/master/andhrabharati. The main file is ab90ab.txt. This done to make it easier to work further with the file, and track changes with git.

@Andhrabharati If you clone this repository, you can make changes in ap90ab.txt (or other files), then push the changes to github.

Andhrabharati commented 3 years ago

Sure, will do so; I was wondering if you'd "adopt" this work, hence I did not upload it myself to the repo.

gasyoun commented 3 years ago

@Andhrabharati If you clone this repository, you can make changes in ap90ab.txt (or other files), then push the changes to github.

Makes sense.

Andhrabharati commented 3 years ago

Hats off to you, @Andhrabharati ! I'm sure your version will prove useful in numerous ways to Cologne's efforts with Apte's dictionary.

@funderburkjim and team (@gasyoun & @drdhaval2785 )

Has anybody really "seen" inside the file I sent?

It has a beautiful and unimaginable (implicit) solution to the biggest problem for the CSL, being discussed in many threads. [I take that they do not consider the actual biggest issue that I see −not having CORRECT proofed text for any of the works− as the primary issue, but are just trying to see how to handle the data in hand programmatically.]

I was expecting someone to identify the same.

drdhaval2785 commented 3 years ago

Count me out for 4-5 months.

Andhrabharati commented 3 years ago

Understood, Dr Dhaval.

We all know that you are on a more important task. [I have added your name after much deliberation, just to ping you.]

funderburkjim commented 3 years ago

The most obvious aspect 'inside' is the identification of 'subheadwords', which include compounds as well as other variants. For instance:

00091 (0003,b) <p> अकर्मन् {a.} [न. ब.]
     1. Without work, idle; inefficient. −2. Disqualified for performing the necessary rites, wicked, degraded; अकर्मा दस्युरभि नो <Rv.> 10. 22. 8. −3. ({Gram.}) Intransitive, generally in this sense अकर्मक.

00092 (0003,b) <p> {n.} (र्म)
     1. Absence of work; absence of necessary observances; neglect of essential observances; inaction; कर्मणो ह्यपि बोद्धव्यं बोद्धव्यं च विकर्मणः । अकर्मणश्च बोद्धव्यं गहना कर्मणो गतिः <Bg.> 4. 17, 18. −2. An improper act; crime, sin.

00093 (0003,b) <+> −अन्वित {a.}
     1. unengaged, unoccupied, idle. −2. criminal.

One thing I would wish for is an additional field on the first line of 'sub-entries' like 92 and 93; this field provides the full headword.

For instance:

00091 (0003,b) <p> अकर्मन् {a.} [न. ब.]
     1. Without work, idle; inefficient. −2. Disqualified for performing the necessary rites, wicked, degraded; अकर्मा दस्युरभि नो <Rv.> 10. 22. 8. −3. ({Gram.}) Intransitive, generally in this sense अकर्मक.

00092 (0003,b) <p> {n.} (र्म) : अकर्म
     1. Absence of work; absence of necessary observances; neglect of essential observances; inaction; कर्मणो ह्यपि बोद्धव्यं बोद्धव्यं च विकर्मणः । अकर्मणश्च बोद्धव्यं गहना कर्मणो गतिः <Bg.> 4. 17, 18. −2. An improper act; crime, sin.

00093 (0003,b) <+> −अन्वित {a.} : अकर्मान्वित
     1. unengaged, unoccupied, idle. −2. criminal.

Note using a ':' to separate this additional headword would conflict with entries 04673 and 23860 which also have a ':' character.

04673 (0106,a) <p> अपत्यं [न पतंति पितरोऽनेन, पत् बाहु॰ करणे यत्, न. त.; some derive it from अप, the termination त्य being added to it, as in तत्रत्य, अत्रत्य, sprung from a stock; Yāska gives two etymologies: अपत्यं कस्मात् अपततं भवति पितुः सकाशादेत्य पृथगिव ततं भवति, अनेन जातेन सता पिता नरके न पततीति वा]

23860 (0446,a) <p> गगनं(णं) (Some suppose गगण to be an incorrect form, as is observed by a writer:− फाल्गुने गगने फेने णत्वमिच्छंति बर्बराः)

However, if we consider the 'additional word separator' to be ' : ' , then there is no conflict.

Could also use ' = ' as separator.

Another advantage of adding this field to the sub-entries is that then the primary entries would be identifiable, by virtue of the fact that they would not have this field separator on the first line.

Andhrabharati commented 3 years ago

Good to see you back @funderburkjim

All those would eventually come-in.

BTW, the primary and other words are already indicated distinctly with <p> and other <b, c, +> marks.

Again with ref. to the sample screen from my AP57 work,

I have stopped the work at col. B, having seen some errors in that portion before proceeding to bringing up col. C and col. D [which is hardly about 2 days work for me].

I had mentioned the same in my posting earlier that the HWs portion needs proofing.

And I am likely to start the HWs proofing in next 2-3 days (and hope to finish in 10-12 days' time). ------------------------- Finally this is not the point I wanted some one to "identify". the sub-HWs were even otherwsie identified by the preceding double dash in the digitisation; this is never discussed seriously by the team, but has been mentioned just once by Dhaval and once by Marcis in their postings at two separate works/issues.

A hint: It is the <ls> marking being limited to just the names. without including the following numbers.

Would you like to guess what that has to do with Cologne discussions? or want me to go on explaining the intention of doing so?

funderburkjim commented 3 years ago

Yes, please explain the ls marking you chose; I did notice it but did not see significance. Here is example:

00011 (0002,a) <+> −भाज्, −हर, −हारिन् {m. f.} [{उप. समास}]
     one who takes or has a share, one entitled to a share in the ancestral property, an heir, a coheir; 
        पिंडदोंशहरश्चैषां पूर्वाभावे परः परः <Y.> 2. 132; जातोपि दास्यां शूद्रेण कामतोंशहरो भवेत् 133.

Andhrabharati commented 3 years ago

The first point is that literary source (ls) is just the name, the internals are citations.

Now coming to the actual matter.

I am referring mainly to PWG in the following.

Pl. look at the following images-

PWG_aMSu

PWG_apAna

Notice the train of numbers which are hard to comprehend, and apparently difficult to separate them programmatically.

Am I right in saying that this has been a big issue discussed at many places?

I have a simple (universally applicable) solution to this issue.

Andhrabharati commented 3 years ago

Now pl. go through this file, which I feel is very much self-explanatory.

ls marking & links for citations or references.txt

Hope you'd get the reason of separately marking just the name for <ls>, by going through this file; it comes out in the next part of handling the numbers etc. for linking.

Await your comments on this (of course, I've nothing to get out of this).

Andhrabharati commented 3 years ago

And I am likely to start the HWs proofing in next 2-3 days (and hope to finish in 10-12 days' time).

@funderburkjim

I said this, as I am not getting any response from you regarding PWG Biblio task that I started and made several posts under that issue.

I thought I would wait for another two days and then go back to Apte next phase.

funderburkjim commented 3 years ago

as I am not getting any response from you regarding PWG Biblio

Sorry about that -- I've been focusing almost entirely on Ap90 recently. Will get to your work on PWG Biblio ASAP. (as soon as possible).

funderburkjim commented 3 years ago

Notice the train of numbers which are hard to comprehend, and apparently difficult to separate them programmatically. Am I right in saying that this has been a big issue discussed at many places?

I consider the numbers to be part of the ls references. Yes, it is difficult to get them all marked properly. (using one of two formats: <ls>P. II. 3. 4</ls> or cases when 'P.' is implied but not present, such as <ls n="P.">II. 3. 4.</ls> -- of course 'P.' is just an example; same principle applies to other sources.)

I think that nearly every literary source abbreviation reference is now marked in ap90.txt.

With this work, the literary source markup for ap90 is now better than for mw or pwg.

Andhrabharati commented 3 years ago

I've been focusing almost entirely on Ap90 recently.

Good to hear this from you, @funderburkjim.

In fact I was doubting the same and expressed it so in a personal communication with Marcis two days back-

And that is the way, in my opinion, to handle things; with such a focused approach, each one of the works would be brought into a good shape in 1-2 month's time [in your style of having (too much detailed) rigorous varieties; in my style of simplistic (necessary & sufficient) markings, it would be just 2-3 weeks at the max.] (in all aspects of markups & presentation). With the piecewise tasks and jumping across the works, it would take ages to finish even a single work.

Also glad that Apte is becoming the standard ref. now at Cologne works (superseding MW99)! [I cannot stop reiterating my appreciation for the change in page-break marking; it is the best one, out of all the changes.]

Here are some more suggestions from my side-

consider expanding the present <ls> and abbr. lists in AP90 to cover the complete text, either doing afresh or starting from the files I had sent.
split the "−With" (prefixed verb) entries also to another line as I did in my file; seen that this way of representation has been considered good by the team as done at WIL (and wished the same to be applied throughout the works)
...

(Would you like me fill these further?)

And applying similar approach to PWG (which I am looking at present, the <div n="2"> (within <div n="1">) and <div n="3"> (within <div n="2">) markings can be removed altogether (I guess they are extraneously split thus; they being mostly some kind of lists −or some related items− rather than different entities to be separated). And the <div n="4"> is eligible to be the first split <div n="0">, a level above the <div n="1"> numbered meanings.

Andhrabharati commented 3 years ago

-- of course 'P.' is just an example

You may note that 3 occurrences of P. were expanded 'locally' in my file.

05168 (0117,a) <p> अपि @{ind.} has 'I wish I were P(urūrava).'

and

09308 (0194,c) <p> −थं has 'P(riyaṃvadā) is right, what P(riyaṃvadā) says is right;'

There are some more instances like this [R(āmānuja) for R., M(ālavikā) and M(ālatī) for M., T(rigarta) for T., K(āśyapa) for K.] in the whole text.

You may also consider doing all such local expansions in the data.

Andhrabharati commented 3 years ago

@funderburkjim

Continuing further on the P. (the L-numbers are from my "text")

Under 01991 (0042,c) <+> (−ना) अद्यश्वीनावष्टब्धे <P.>??? This refers to the prev. citation number [<P.> V. 2. 13] and is explained further by the Grammar book Sk. (सिद्धान्तकौमुदी). Interestingly, this whole citation is removed in AP57!! S2H has it as पा॰ 5.2.13
Under 03936 (0087,c) <p> −तः पुंसो यमांतं व्रजतः <P.>??? 2. 115 Wonder how this has become R. 2. 115 in your display; here it is P. only, not R. This is not expanded even in AP57 (just given as P. 2. 115), but the Apte Student ed. S2H (a translation of Apte Student's S2E 1890 into Hindi in 1965) mentions this as पंच॰ 2.115 (पंचतन्त्र, पञ्चदशी, पञ्चरात्रम् being given as the full forms of पंच॰).
Under 61495 (1040,c) <p> शांभव अत्तुं वांछति शांभबो गणपतेराखुं क्षुधार्तः फणी <P.>??? 1. 159. Wonder how your text got this as Pt. 1. 159 (is it taken from AP57?); S2H has this as पंच॰ 1.159

Finally, having noticed that this S2H (1965) has few additional names not in the Practical S2E 1890 or 1957 (and not even in the Student's S2E 1890 from which I gave the extra abbr.s and ls names earlier in another thread), I am giving the relevant pages from it hereunder, as they might be useful. [BTW, this S2H has listed many grammatical abbr.s also extra as compared to all the English editions.]

Apte S2H- Names of works or authors.pdf

Hope you can take the ref. names from this, or would you like me to make a text file of these to ease your work?

gasyoun commented 3 years ago

literary source markup for ap90 is now better than for mw or pwg.

Apte and we are happy. So a new approach has been born and documented?

Andhrabharati commented 3 years ago

Attention to the following point at https://github.com/sanskrit-lexicon/AP90/issues/15#issuecomment-845119956 is requested.

The citations are demarked with semicolon (almost) consistently when there is a work or section change; and with a comma or a dash when the citation is within the section.

I had noticed that the AP90 book has it thus at few places, but not consistently followed throughout.

In general, this is a standard practice of separating the citations in the books. See for e.g. Bloomfield's Vedic Concordance, and the following from an excellent article by HD Velankar.

---------------------- @funderburkjim seems to have filled such numbers to be full strings (padding the chapter/section from the preceding occurrence), but still retained the comma as the separator in those places- I guess now these should become ; separated as there is no need for the number to denote its belonging to the previously mentioned section with the comma).

gasyoun commented 3 years ago

split the "−With" (prefixed verb) entries also to another line as I did in my file

Agree.

Apte Student ed. S2H (a translation of Apte Student's S2E 1890 into Hindi in 1965)

Seems to be in so many ways smarter.

Andhrabharati commented 3 years ago

@gasyoun are you aware that Bloomfield's book has been revised/updated after about a century in 2007/8?

Andhrabharati commented 3 years ago

@funderburkjim

I've modified the file structure and uploaded as ap90ab_v2.txt

Also couple of textual corrections were done in this.

Hope you might get some more ideas of presenting the data in a better manner looking at this. [esp. have a look at the lines with //]

Andhrabharati commented 3 years ago

@funderburkjim

Seems the life doesn't go simple always!

Found few occasions, where the ls name to "pad" to the citation number isn't always the preceding ls name. It is one previous to previous ls name (2nd preceding) in those cases.

So unless one goes through the text matter manually, it is difficult (rather, impossible!) to find such occurrences.

Andhrabharati commented 3 years ago

as I am not getting any response from you regarding PWG Biblio

Sorry about that -- I've been focusing almost entirely on Ap90 recently. Will get to your work on PWG Biblio ASAP. (as soon as possible).

@funderburkjim I am allured to PWG for some unknown reason (probably, it being in a different language than all others that I had dealt with so far; and also as it has many citations and has been the "base" for almost all the later Sanskrit dictionaries).

Also as I feel that there is no "sync" even in your AP90 work with my posts, I had changed to PWG and had the full file converted at my end (as was AP90 done earlier).

Now it is being done in my style entirely (mostly similar to AP90), and the structure is coming out quite good.

Noticed too many rendering issues in Cologne using the PWG data, but I see no point listing all those at the forum.

I have the VN pages text (missed in pwg.txt but present in pwg_orig.txt) also included into the file now.

gasyoun commented 3 years ago

Bloomfield's book has been revised/updated after about a century in 2007/8?

It's a no real update is what I know.

So unless one goes through the text matter manually, it is difficult (rather, impossible!) to find such occurrences.

The only hope there are tens and hundreds of them, but not thousands.

Noticed too many rendering issues in Cologne using the PWG data, but I see no point listing all those at the forum.

Please do enlist them.

I have the VN pages text (missed in pwg.txt but present in pwg_orig.txt) also included into the file now.

Guess we should too.

Andhrabharati commented 3 years ago

Bloomfield's book has been revised/updated after about a century in 2007/8?

It's a no real update is what I know.

... ...

This is all "no real update" for you, @gasyoun? - surprising!!

Andhrabharati commented 3 years ago

literary source markup for ap90 is now better than for mw or pwg.

Apte and we are happy. So a new approach has been born and documented?

I would wish that AP90 should also be put in bold (like MW99) at the CSL home page (https://www.sanskrit-lexicon.uni-koeln.de/), to indicate that now it also has become a "standardised" work there.

funderburkjim commented 3 years ago

@Andhrabharati Homepage changed per your request.

funderburkjim commented 3 years ago

It is one previous to previous ls name (2nd preceding) in those cases.

Please provide a few examples that you have noticed.

funderburkjim commented 3 years ago

Under 03936 (0087,c)
−तः पुंसो यमांतं व्रजतः <P.>??? 2. 115 Wonder how this has become R. 2. 115 in your display; here it is P. only, not R.

Don't know how 'R' came about -- maybe typist misread 'P'.

Clearly R. is wrong (Raghuvaṃśa) as 2nd Canto has only 75 verses (so 115 is out of range).

Also, Clearly P. is wrong, since P. refers to Pāṇini's Aṣtādhyāyī and references are always of form P. + Roman-numeral + number + number, and P. 2. 115 does not fit this form.

Pt. is Panchatantra, and 2.115 is a meaningful reference, however the text पुंसो यमांतं व्रजतः is not found where expected in the archive.org version: https://archive.org/details/panchatantracoll00purnuoft/page/154/mode/2up

So best guess is that this should be Pt but not sure about the 2.115 reference.

funderburkjim commented 3 years ago

Apte S2H- Names of works or authors.pdf

In recent work with marking literary sources in ap57, I encountered many cases where a literary source was not readily identified as one given in ap57's list of such abbreviations.

Here is the current work in progress (which shows current counts in ap57 and ap90 for marked literary sources).

lscount_comp.txt

I see a few in your Apte S2H that resolve 'ap57-extra-unknown' cases, Such as Hari. and Bhartṛ.. So your list looks useful in this way.

Since @sanskritisampada has just finished work on English word corrections in SHS, I'll ask her to make a text file from the pdf you provided.

Andhrabharati commented 3 years ago

Pt. is Panchatantra, and 2.115 is a meaningful reference, however the text पुंसो यमांतं व्रजतः is not found where expected in the archive.org version: https://archive.org/details/panchatantracoll00purnuoft/page/154/mode/2up

So best guess is that this should be Pt but not sure about the 2.115 reference.

Pl. see the first line on p.27 here- https://archive.org/details/p2panchatant00bhuoft/page/27/mode/1up

funderburkjim commented 3 years ago

Yep - your reference has the text. Good. That confirms that Pt. is correct, and both ap90 and ap57 should be changed accordingly (under headword 'anta' (ap57) , 'aMta' (ap90).

Clearly some difference in the edition I chose and the one you found.
Yours is the edition of Buhler. So this seems the one Apte references. Your link covers Pt. 2 and Pt. 3; I see another Buhler volume for 4-5 Example: https://archive.org/details/p3panchatant00bhuoft/page/44/mode/2up gives reference Pt. 5. 32 for phrase kleSasyAMgamadattvA under headword aNgaM (ap90); so this edition also consistent with Apte references.

However, I could not find Buhler volume 1 Panchatantra. (Such as example Pt 1.159 अत्तुं वांछति शांभबो गणपतेराखुं क्षुधार्तः फणी given above).

Andhrabharati commented 3 years ago

Many of AP90 citations are taken from PWG and Vacaspatyam; so one should first look at these two works to resolve AP90 citations.

Andhrabharati commented 3 years ago

Panchatantra I was done by Kielhorn, not Buhler; he did the other 4 parts of it.

See the penultimate line here for 1.159 citation- https://archive.org/details/in.ernet.dli.2015.282562/page/n27/mode/1up

Andhrabharati commented 3 years ago

Many of AP90 citations are taken from PWG and Vacaspatyam; so one should first look at these two works to resolve AP90 citations.

And PWG has Buhler as one of the sources for Panchatantra (PAÑCAT. ed. BÜHL.)!! [The PAÑCAT. ed. Bomb. in PWG probably refers to Kielhorn.]

Andhrabharati commented 3 years ago

Wonder how this has become R. 2. 115 in your display; here it is P. only, not R.

Don't know how 'R' came about -- maybe typist misread 'P'.

The Cologne file I used has P. only,

[{#puMso yamAMtaM vrajataH#} P. 2. 115 going <lbinfo n="in+to"/> into the vicinity or presence of Yama;]

so it's definitely not the typist's hand to make it R.

funderburkjim commented 3 years ago

s2h works/authors pdf now digitized, by @sanskritisampada.

The main form is slp1. Transcodings into devanagari and iast have been made.

Files are in https://github.com/sanskrit-lexicon/AP90/tree/master/apte_s2h

@Andhrabharati Would you proofread?

Andhrabharati commented 3 years ago

Sure, would do it.

Andhrabharati commented 3 years ago

Done.

Updated Devanagari version is posted at the same place. [Added missing entries and portions; also corrected few printos, and added the English names at two places.]

gasyoun commented 3 years ago

actual biggest issue that I see −not having CORRECT proofed text for any of the works

You mean that not enough critical editions were printed by time of Apte or what?

gasyoun commented 3 years ago

[The PAÑCAT. ed. Bomb. in PWG probably refers to Kielhorn.]

Agree, Buhler was a close friend from the Pune days of Kielhorn

See https://de.wikipedia.org/wiki/Georg_B%C3%BChler Panchatantra with English notes ("The Bombay Sanscrit Series", 1868; 1891)

Andhrabharati commented 3 years ago

actual biggest issue that I see −not having CORRECT proofed text for any of the works

You mean that not enough critical editions were printed by time of Apte or what?

You've read this completely out of context!!

I was talking about the status of text(file)s at the CSL project; Apte has no access to text(file)s, but just the "printed books" (I did not see him citing from any manuscripts).

funderburkjim commented 3 years ago

Dealt with @Andhrabharati proofread of apte_s2h_works_deva.txt.

modified slp1 version so that transcoding of slp1 to deva agrees with the proofread version.
- only 2 exceptions. See [https://github.com/sanskrit-lexicon/AP90/commit/2dc966e36ff0515046539687ee64b3deef1ce320#diff-76f3cec96ec7510ba2f25dfb6bdad3948162d9623640e232ddc4d0fa91bb6338].

Left मुख॰ : मुखपञ्चशती unchanged.

Noticed these 'print changes':

Small errors in the print:
; print change: mahAtmya -> mAhAtmya
de. ma. : devI mAhAtmya
; print change: lalita sahasranAma -> lalitA sahasranAma
lalita. : lalitA sahasranAma
; print change vAkpadIya -> vAkyapadIya
vA. pa. : vAkyapadIya
; print change: vedAnta deSikA -> vedAnta deSika
ve. de. : vedAnta deSika

These differ by a space only
; print change: SivapurARa -> Siva purARa
Si. pu. : Siva purARa
; print change: SyAmalAdaRqaka -> SyAmalA daRqaka
SyAma. : SyAmalA daRqaka
; print change: suDAlaharI -> suDA laharI
suDA. : suDA laharI

Thanks for the proofread, @Andhrabharati ! Numerous errors corrected.

Andhrabharati commented 3 years ago

Left मुख॰ : मुखपञ्चशती unchanged.

I guess you should remove this from the ls list. It is not an ls abbr. but just the synonym for 'face', as I had remarked.

gasyoun commented 3 years ago

Numerous errors corrected

Thanks goes to both!

Andhrabharati commented 3 years ago

Panchatantra I was done by Kielhorn, not Buhler; he did the other 4 parts of it.

Here is the supporting screenshot (from Oct 1876, JRAS) for this-

sanskrit-lexicon / AP90

Andhrabharati coding of AP90 #17