M2 and f2 in SLP1 encoding of Devanagari

funderburkjim commented 3 years ago

This note in response to https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-846575371.

Preliminary examination of cases with M2 occurring in SLP1.

Example under headword naBanya:

Current display rendering of Devanagari, with accents shown:

The slp1 code in pwg.txt: gAya\tsAma^ naBa\nyaM2^\ yaTA\ veH#

The scanned image:

A similar example is under headword naBas, on the same scanned image, 2nd column.

These involve coding of accented numbers. Our focus is on the accented '2': naBa\nyaM2^\

But also notice the similarly accented '3', whose slp1 coding is na\Ba\nyo\3^\

In this case, the error appears to me to be that we have '2' whereas we should have '1': naBa\nyaM1^\

For comparison, look under headword atizWant, which has an accented '1':

slp1: ati^zWantama\pasyaM1^\ na sarga^m

Current display rendering:

Scanned image:

funderburkjim commented 3 years ago

My inference from the example is that we should change M2^\ to M2^` in pwg.txt.

This occurs 4 times.

funderburkjim commented 3 years ago

Other 'M2'

M2 occurs in slp1 48 times, based on Emacs search

4 times it occurs followed by the svarita and anudatta accents, as above naBanya example (slp1 M2^\)
- (headwords naBanya, naBas, also headwords par, parc)
6 times M2 occurs followed by the svarita accent (SLP1 ^) only
10 times M2 occurs followed by the anudatta accent (SLP1 \) only
20 times M2 occurs followed by a space
8 times M2 occurs followed by some character other then ^\.

I agree that the M2 coding is probably wrong in all 48 cases, but the correct coding will need to be determined by examining the scanned image in each case.

funderburkjim commented 3 years ago

f2 in slp1

Andhrabharati also noticed two instances of these. These both occur in the 2nd and 6th plural of 'pitf' (father), Both under headword svaDA, page 7-1424 lines 10 and 12:

slp1 coding should be 'F' for the long vowel:

OLD: line 10
sva\DA sTa^ ta\rpaya^ta me pi\tf2n
NEW:
sva\DA sTa^ ta\rpaya^ta me pi\tFn

OLD: line 12
svaDA vE pitf2RAmannam
NEW:
svaDA vE pitFRAmannam

These two changes are being made.

funderburkjim commented 3 years ago

The other two `M2^\` examples

Under headword 'par'. slp1 = sa A^cA\ryaM2^\ tapa^sA piparti Scan (page 4-0477, line 9):

Under headword 'parc', slp1=ta\nvA^ me ta\nvaM2^\ saM pi^pfgDi scan (page 4-0569, last line):

The correction M2^\ to M1^\ is appropriate for these two cases also.

Andhrabharati commented 3 years ago

Yes @funderburkjim, my inference in my 2nd sentence was in a hurry, but the first sentence is the one actually intended. [This is reported when I was at the very initial stage, not yet reaching the accents portion.]

Glad that you're verifying each individual case patiently!!

Andhrabharati commented 3 years ago

Looking at the images you posted (Cologne display and PWG print), I am forced to say this- Pl. look at my posts under #5.

funderburkjim commented 3 years ago

The remaining M2 changed. See commit eddc670 above. These M2 changed to candrabindu (slp1 ~).

Closing issue.

funderburkjim commented 3 years ago

candra-bindu and accent

In the M2-candrabindu examples, several occured with a contiguous accent; the slp1 was recoded so the accent preceded the candrabindu (and thereby followed the vowel). The reasoning was that candra-bindu is analagous to anusvara, and we decided that 'vowel + accent + anusvara' is the preferred coding sequence.For example,

vaSAM2\ -> vaSA\~
martAM2^ -> martA^~

But, there are many (190) instances elsewhere in pwg.txt digitization where the coding sequence is 'vowel + candra-bindu + accent'; I think these should also be changed, such as

nira\mitrA~^ => nira\mitrA^~,
apA\mitrA~\ => apA\mitrA\~
nF~/HpraRetra => nF/~HpraRetra (the only example with udAtta accent '/')

@drdhaval2785 or @Andhrabharati : Am I understanding properly? Should those additional 190 changes be made as indicated?

Andhrabharati commented 3 years ago

you're right @funderburkjim, in catching the point.

these three are to follow the vowels (with accents, if any) before a consonant coming next.

and these three have their own accent marks, which are different from the vowel's 3 accents. (we can talk about them later, if and when time comes)

funderburkjim commented 3 years ago

Thanks, @Andhrabharati !

The changes now made and installed. See csl-orig commit addefb3 above. The change transactions are in changes_cb-accent.txt

Andhrabharati commented 3 years ago

I just would like to say that I did all these and many more corrections in my PWG text; but as I was not getting any reciprocation (in time), I decided not to share my file or observations (already a message saying thus was posted few days back).

I just would be giving my feedback when some work happens here (in this PWG).

However I can share the biblio file, if @funderburkjim responds on my postings about it. [This being the work I was requested to take up and was the beginning of my serious journey in the "German" world; though I had read & used couple of German texts earlier, like few Prakrit works by Weber and Pischel, never paid attention to know these many details that I learnt while "doing" the PWG.]

funderburkjim commented 3 years ago

Since I've not seen your file, it is hard to know what exactly you have done. If you think your file (or files) would be useful to me or others, then you might consider sharing it.

It's not a question of whether you are making interesting and useful observations regarding pwg, pw, mw, ap90, vcp, etc. -- you definitely are ! (such as the point regarding M2 and f2 that I have responded to in this issue, and your comments regarding accents in #5 that I am also relying on, your work with Dhaval on VCP, etc. etc.).

Rather it is a question of how to communicate your observations in a way that is most helpful.

One option might be for you to make a small sample of the file (or files) that you are considering sharing. This would be done in a separate issue, in which you explain (document) how the file is structured, and how it relates to the existing digitizations currently being used at Cologne, and how we might best make use of the file to improve the Cologne digitizations. This option might be appropriate for the 'biblio file' (which I presume is related to pwgbib_input.txt).

A second option might be for you to distil your observations into bite-sized chunks (one chunk per issue) that I or others can make use of. In this approach, you might base a suggestion on your reconstructed files, but would communicate some adaptation of your approach that we could readily use to improve the current digitizations. Again note the 'one chunk per issue' suggestion -- when you include dozens of different comments in one github issue (like #37), I find it overwhelming rather than useful. Better to try to use the Github issues to separate suggestions into distinct actionable chunks.

My preference would probably be the second option or some combination of the two options.

With good will and patience, we should be able to develop a modus vivendi that will benefit current and future users of the Sanskrit dictionaries initially digitized by @thomasincambodia and his typists in India.

Andhrabharati commented 3 years ago

One option might be for you to make a small sample of the file (or files) that you are considering sharing. This would be done in a separate issue, in which you explain (document) how the file is structured

This is exactly what I did while giving my version of AP90.

https://github.com/sanskrit-lexicon/AP90/issues/15#issuecomment-845119956 ------------------

And this was your reaction about my Apte90 file-

@Andhrabharati has made a complete version of AP90 digitization. This issue devoted to comments and further discussion.

... ... ...

Hats off to you, @Andhrabharati ! I'm sure your version will prove useful in numerous ways to Cologne's efforts with Apte's dictionary.

And indeed, it looked as if you took some pointers from my file in updating yours (as many of your later posts indicated being so, though not giving a direct reference to my file).

Andhrabharati commented 3 years ago

Now coming to your statement,

Since I've not seen your file, it is hard to know what exactly you have done.

With my statement (as at https://github.com/sanskrit-lexicon/AP90/issues/17#issuecomment-851552308)

Now it is being done in my style entirely (mostly similar to AP90), and the structure is coming out quite good.

you can guess how my pwg file could be.

Andhrabharati commented 3 years ago

My preference would probably be the second option or some combination of the two options.

My style is to work on the whole body of the book at once (mostly), and to cover as many "inter-related" points together as possible.

With good will and patience, we should be able to develop a modus vivendi

Probably you can split them into sep. issues as per your liking, as you did earlier in splitting a portion from sanskrit-lexicon/AP90#15 to sanskrit-lexicon/AP90#17, and I guess you're patient enough to do so.

The best I can do from my side is to list my points numbered, as I did at MW99 in the beginning https://github.com/sanskrit-lexicon/MWS/issues/96#issuecomment-765957178 and continued the list further, down the way, at https://github.com/sanskrit-lexicon/MWS/issues/96#issuecomment-766022647, https://github.com/sanskrit-lexicon/MWS/issues/96#issuecomment-766096324, https://github.com/sanskrit-lexicon/MWS/issues/96#issuecomment-766110688, https://github.com/sanskrit-lexicon/MWS/issues/96#issuecomment-766375135, https://github.com/sanskrit-lexicon/MWS/issues/96#issuecomment-766564827 which can ease your splitting-process.

sanskrit-lexicon / PWG