VN missing pages, continued

funderburkjim commented 2 months ago

Continue the discussion of VN (additions and improvements) for PWG, that was begun in #39.

funderburkjim commented 2 months ago

https://sanskrit-lexicon.uni-koeln.de/pwgindex.html alleges to be a scanned edition of PWG, created long ago from material from @maltenth.
readme_cdsl_vn.txt is a sort of index to the VN portions from the various volumes.

Compare the volume 1 VN material from this cdsl source to the volume 1 VN material supplied by @Andhrabharati in #37, #39 : AB vol 1 vn pdf.

They are very different.

How to account for this difference? If we want to improve the VN coding in cdsl pwg.txt, which of the two sources should be used?

Although AB does not divulge the exact source of his pdfs, perhaps he could retrieve some information from the title pages that would explain the difference.

Andhrabharati commented 2 months ago

@maltenth has to respond about the scans at CDSL!

And it is my lookout to reach to the best possible original sources (either scans or physical books) that enhances my collection, as a conitinuous process. [I find that many works cited in PWG could be traced at the Bavarian library (having excellent quality). Of course, there are quite many other sources as well.]

So far as I am concerned, the text in the pwgheader (of older date) is exactly what is in the print volumes; and I just had proofed the same (and at times split some matter into separate lines) and posted earlier.

Jim could probably start with converting the PWG-VN data in Thomas's original format to current CDSL format.

gasyoun commented 2 months ago

https://sanskrit-lexicon.uni-koeln.de/pwgindex.html alleges to be a scanned edition of PWG, created long ago from material from @maltenth.

This was 2002 or 2004, I received them on a CD from Germany in Moscow.

Andhrabharati commented 2 months ago

This post indicates some of my PWG "sources".

readme_cdsl_vn.txt is a sort of index to the VN portions from the various volumes. ... ... vol. 1 (no VN matter)

If the Vol.1 does not contain any VN matter, how would Jim (and/or Thomas) explain the presence of the typed matter from those pages in the pwgheader file (that was received from Thomas)?

funderburkjim commented 2 months ago

@maltenth has to respond about the scans at CDSL!

Unfortunately, communication by me with Thomas has become unpredictable this year.

funderburkjim commented 2 months ago

If the Vol.1 does not contain any VN matter ...

Just now, I compared a. the volume 2 material from pwgindex to b. pwgheader/PWG.V.2.VN.pages.pdf

They are identical. From this I infer that the material in pwgheader/PWG.V.1.VN.pages.pdf is simply absent from the pwgindex images. Why absent -- no way to know. This allays my concern regarding possible version difference.

I have revised pwgindex program to include

pages from pwgheader/PWG.V.1.VN.pages.pdf.
page from pwgheader/PWG.V.6.VN.pages.pdf.

funderburkjim commented 1 month ago

Based on recent review (see revised readme_cdsl_vn.txt), the first task is to provide 'entries' for the 'missing' VN material. The sources for this missing material has two parts:

PWG.VN.text.Vol.s.1-6.txt typed by AB . Jim's task is to convert [v.pppp] lines to the metaline-body-lend format of pwg.txt entries. Let's refer to this file by the shorter name VNTXT.
The AV reference improvements on page 3 of PWG.V.1.VN.pages.pdf. AB's task is to type this in some straightforward way. Then Jim's task will be to convert AB's file to the m-b-l format of pwg.txt entries.

The VNTXT file has 599 lines 'to convert'. 131 of these are, following the printed text, without headwords. The first examples:

[1.0012]     ¦ streiche das Beispiel u. अक्न ...`    <<< headword is 'akna'
[1.0014]     ¦ Z. 31 streiche <ls>ṚV.</ls> <8,46,26>.   <<< headword is 'akza'

AB: Have you already determined these missing headwords?

Andhrabharati commented 1 month ago

AB: Have you already determined these missing headwords?

No, Jim; you may refer to my related post. But, it is a fairly simple task if decided to be done!

Andhrabharati commented 1 month ago

If I am to do it, I might wish to re-look at the whole content for a possible 'revision'!!

Andhrabharati commented 1 month ago

The AV reference improvements on page 3 of PWG.V.1.VN.pages.pdf. AB's task is to type this in some straightforward way.

It may be noted that the form <X> = <Y> is to be considered something like lies <X> st. <Y> at these AV citation changes. As such, I don't suggest changing this format.

Andhrabharati commented 1 month ago

@funderburkjim

I think I have now properly changed these PWGVN lines, to the format as in the pwkvn pages. PWGVN_1-6reformatted(dng).txt

There are couple of places (the lines having ...do..., ??? and ;;) that you may need to look at first. ----------------------------- PS. I feel the VN lines of PWG-5 (lines 553-573 in my file) could be discarded, as the page is not to be seen in the original Bavarian Library and the re-printed Japanese (MLBD) ed. copies.

funderburkjim commented 1 month ago

Looks like this reformatted file is the one I should work with.
- In particular, it has filled in what I called the 'missing' headwords
Why the 21 do... items in headword field ?

Andhrabharati commented 1 month ago

...do... denotes that the VN line belongs to the same HW as above! Or in other words, those HWs contain two or more corrections.

Andhrabharati commented 1 month ago

And you may note that the [v.pppp] after the broken bar denotes the actual correction location, not the pc-field for the metaline (which should be built with the previous [Page:VNv-ppp]).

funderburkjim commented 1 month ago

Note: ...do... ¦ [1.0014] Z. 31 streiche <ls>ṚV. 8,46,26.</ls> actually refers to 'अक्ष, not toअक्ष्` (the HW above). This is the only one I've checked.

Andhrabharati commented 1 month ago

yes; in fact it should be referring to <hom>2.</hom> अक्ष.

Probably these should be checked again all over for the homonyms and accent marks (which I had missed at some places), after you prepare the file.

Andhrabharati commented 1 month ago

PS. I feel the VN lines of PWG-5 (lines 553-573 in my file) could be discarded, as the page is not to be seen in the original Bavarian Library and the re-printed Japanese (MLBD) ed. copies.

The "actual" reason I had in mind is not about the VN part in the Cologne-scan on Sp.1677-8 (which is present in PWG7 as well), but that many entries in the Bavarian copy do not "appear" anywhere else, incl. the CDSL text.

Bavarian Library copy scan page--

CDSL scan page--

funderburkjim commented 1 month ago

transcoding

The transcoding to slp1 (from vntext_0_deva.txt to vntxt_0.txt) required a few edits of vntext_0_deva. See change_vntxt_0_deva.txt.

funderburkjim commented 1 month ago

vntxt_1.txt

Correct pwg-devanagari accents that were missed in vntxt_0.txt.

funderburkjim commented 1 month ago

lines 553-573 of AB file

From an examination of these 21 headwords with current PWG display:

only one headword does not have an entry in vol 7; the exception is AlokagadADarI, which appears as headword AlokagAdADarI in vol. 7
The content in lines 553-573 for a given headword is generally similar (but not identical) to the corresponding content in vol 7; however the content under pAriBAzika seems different.

I see no problem (and some minor benefit) in KEEPING lines 553-573, since this material corresponds to the scan Thomas made for cdsl.

It is mysterious that the Bavarian edition (per scan above)

doesn't have the material at the bottom of the corresponding page of cdsl scan
Is different in the top half also. e.g. There is a legitimate correction to moGa in Bavarian edition, which I don't find in the cdsl scan.

BTW: it is good that you have not only filled in headwords, but also added page-references for corrections.

gasyoun commented 1 month ago

BTW: it is good that you have not only filled in headwords, but also added page-references for corrections.

long live @Andhrabharati

Andhrabharati commented 1 month ago

[from Jim's file: pwgissues/issue76/readme.txt]

# transcode cd /c/xampp/htdocs/sanskrit-lexicon/PWG/pwgissues/issue76/transcode mkdir pwgtranscoder1 cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/transcoder1/deva_slp1.xml pwgtranscoder1/deva_slp1.xml cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/transcoder1/slp1_deva.xml pwgtranscoder1/slp1_deva.xml

cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/transcoder.py . cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/mw_transcode.py pwg_transcode.py

# heavily edit pwg_transcode.py

It is quite surprising to see that Jim has copied MW's transcoder files to "handle" the PWG transcoding, and had to "heavily edit" the same for the purpose!!

Probably (a) MW is fully overshadowing Jim's thoughts, or (b) Jim is also now entering into "dotage" as Thomas, who himself said thus in response to one of my points earlier.

Jim has a separate "transcoder file-set" for the PWG family from the very initial days (which he had updated for the devanagari accent, upon some prolonged debating with me); and the same should've been used here.

Otherwise, it leads to unnecessary contamination of MW-style and PWG-style of accents, as can be seen from the below snippets from the PWG print and Jim's current revision--

[from Jim's file: change_vntxt_0_deva.txt]

Andhrabharati commented 1 month ago

[from AB's file: PWGVN_1-6reformatted(dng).txt]

{#रााण꣫#} ¦ [6.0317] (auf Bogen 21*) Z. 1; in {#राणि#} und {#पैलादि#} ist der Haken über dem {#ि#} abgebrochen. ;; Jim, this is a case of non-invertibility of Devanagari-slp1-devanagari!!

The transcoding to slp1 (from vntext_0_deva.txt to vntxt_0.txt) required a few edits of vntext_0_deva.

[from Jim's file: change_vntxt_0_deva.txt]

old: {#रााण꣫#} ¦ [6.0317] (auf Bogen 21) ... new: PWG style udAtta -> MW style udAtta, also hiatus ; the cdsl spelling headword in rARa/ = राण॑ {#राण॑#} ¦ [6.0317] (auf Bogen 21) --- old: {#राण॑#} ¦ [6.0317] (auf Bogen 21) Z. 1; in {#राणि#} und {#पैलादि#} ist der Haken über dem {#ि#} abgebrochen. new: Replace DEVANAGARI VOWEL SIGN I with DEVANAGARI LETTER I {#राण॑#} ¦ [6.0317] (auf Bogen 21) Z. 1; in {#राणि#} und {#पैलादि#} ist der Haken über dem {#इ#} abgebrochen. ; Jim doesn't know how to represent in slp1 the 'naked' vowel sign. ; the hook above the {#ि#} is broken

Incidentally, I had discussed about this very item with @drdhaval2785 in private mail exactly 3 years back!

Here is my initial mail to Dhaval--

followed by further responses--

Andhrabharati commented 1 month ago

; Jim doesn't know how to represent in slp1 the 'naked' vowel sign.

It's because Jim is following the slp1 from Peter Schraf, who had duly made a note of this particular point in "his study/survey" (before coming up with slp1)--

but for some reason, did not even "try" to propose any solution!

So it is not just slp1 alone that doesn't handle this, but also (any and) every existing Roman transliteration scheme!

If Jim is "willing" to "update" the CDSL transcoding rules (as he had done in quite many cases till now), I shall post my proposal to handle the same (with which the invertibility condition also gets satisfied).

Probably, Jim might wish to get Peter Schraf's opinion also about that proposal (before taking any action on it).

Andhrabharati commented 1 month ago

I see no problem (and some minor benefit) in KEEPING lines 553-573, since this material corresponds to the scan Thomas made for cdsl.

It is mysterious that the Bavarian edition (per scan above)
* doesn't have the material at the bottom of the corresponding page of cdsl scan

* Is different in the top half also. e.g. There is a legitimate correction to moGa in Bavarian
  edition, which I don't find in the cdsl scan.

In fact, I would consider it to be exactly opposite that the CDSL scan is THE mystery case!

As I had already indicated earlier, both the Bavarian Library scan (1868) and the Japanese reprint (1976) tally exactly with each other, so does any physical book that I had seen in various Indian libraries (or in market now for sale).

Now I have found a scan copy digitised by Google [from the Sapienza University of Rome (Biblioteca di Studi Orientali)] in August 2013, which has both the "proper ending page" of Bavarian copy followed by the "extraneous page" of the CDSL scan (after a blank page).

This is somewhat similar to what we had seen earlier in one of the MW99 scans having two of MD errata pages, about which some discussion has took place, and finally it was concluded that it was an error in binding that particular copy and those two pages were NOT brought into the MW annexure data.

It is surprising that the CDSL scan copy has the "original" ending page (as in all the three above scan copies) MISSING and is left only with the dubious "extraneous" page.

funderburkjim commented 1 month ago

MW is fully overshadowing Jim's thoughts

No, The reason I used the mw transcoders was that I had available the inverse transcoder deva_slp1.xml but did not have deva1_slp1.xml.

I'm constructing deva1_slp1.xml now.

funderburkjim commented 1 month ago

vntxt_1_rev.txt

The inverse transcoder file deva1_slp1.xml now created. I should have done that in the first place. This used to genereate the slp1 version of AB's file: vntxt_1_rev.txt.

Jim thinks that vntxt_1_rev.txt is ready for further use.

"update" the CDSL transcoding rules

I'm curious what such an update would look like. Let's see the proposed transcoder file.

or Jim is also now entering into "dotage" :

Andhrabharati commented 1 month ago

First things first!

Against Jim's two posts 1 and 2 just above this, I would like to re-iterate from AB's post:

Jim has a separate "transcoder file-set" for the PWG family from the very initial days (which he had updated for the devanagari accent, upon some prolonged debating with me); and the same should've been used here.

Here are the transcoders I have with me (as recd from Jim)--

[MW-version, which has no "deva1_slp1.xml" indeed]

[pw-version, which DOES have the "deva1_slp1.xml"]

And he had clearly said those days that the deva1 <> slp1 files were specifically made for the pw-family!! He had also indicated how to check the intertibility using the "to & fro transcoders" one after the other.

I can as well show (point) him where he has posted these transcoders (for me) earlier, if he is still not convinced that these were already existing before! ------------------ PS. Sorry Jim, I didn't use the "dotage" term in any derogatory sense; it was just indicating the state-of-the-mind (forgetfulness) sometimes seen in younger guys as well.

Andhrabharati commented 1 month ago

As I has mentioned in my mail to Dhaval (in the above post), a need to transcoding the vowel-marker (mAtrA) characters arises not only in case of grammar books, as in

[Macdonnell]

or [Monier Williams]

or in reference works, as in [Monier Williams dictionary]

[Unicode Chart: Devanagari]

or in posters, as at [Marcis's post](https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-846456420)

Of course, for most of such works that go to actual publishing, other 'professional means' would be resorted to (and not these Roman transcoding schemes) for the intended text matter appropriately!!

[... post continues further below ...]

gasyoun commented 1 month ago

@Andhrabharati I can hardly imagine a case other than textbook for having the need to seperate the vowel representation.

Andhrabharati commented 1 month ago

but esp. in the cases of "truthfully" showing/indicating [in plain text format] the mistakes or wrong readings (or prints), as at--

[PWG6-0317] ;; which became 6-0333 after correction

[PWGVN 6-001]

[PWG3-0271]

[PWGVN3-001]

Here these Devanagari strings are deliberately typed thus in the text matter, and are NOT at all typos as Jim has commented and "changed" them to the 'corrected' forms--

The transcoding to slp1 (from vntext_0_deva.txt to vntxt_0.txt) required a few edits of vntext_0_deva.

and

Andhrabharati commented 1 month ago

Now is the time for my proposal to transcode these--

I would like to propose using the ¬ ["Not sign"] character (alt+0172; u+00ac) for denoting the following 'vowel-mātrā' character as a 'Not-vowel' character!

The Unicode std. prescribes ◌ ["Dotted circle"] (u+25cc) character to be used as a place-holder, and showed it in positioning the diacritic-marks (which I am now extending to positioning the vowel-markers as well).

Namely, the proposal goes like this--

Note: Devanagari transcoding would not be with the dotted circle (the uniscribe engine would take care of rendering the appropriate script character), but the Roman transcoding should be having dotted circle prior to the resp. Roman letter.

Andhrabharati commented 1 month ago

With this notation, we would get the round-robbin strings properly--

funderburkjim commented 1 month ago

deva1 comparison

@Andhrabharati After your comment, I was able to find a deva1_slp1.xml from 2023. In conversion of your file to slp1, the current version shows one improvement. *So the preferred version is deva1_slp1.xml. And this version is now also available in csl-websanlexicon and csl-apidev (which are the cdsl 'official' locations for the transcoders.

Some details on comparison to the 2023 version are in readme_deva1.txt.

gasyoun commented 1 month ago

deliberately typed thus in the text matter, and are NOT at all typos

agree @Andhrabharati

funderburkjim commented 1 month ago

question on `??? ¦ [1.0956] — [1.1016]`

AB file comment states:

20 cases— 12121, 12122, 12145, 12196 (2), 12217, 12247, 12282, 12291, 12350, 12352, 
12369, 12448, 12457, 12470, 12513, 12561, 12593, 12602 and 12691

But there are only 19 L-numbers listed. Does the (2) have some significance that yields 20 cases ?

I plan to generate a VN entry for each of these 19 (or 20) .

funderburkjim commented 1 month ago

Also, I think there are two more at the beginning of the list

11847 {#upanayana#} 1-0956
    Here the reference <ls>ŚĀṄKH. GṚHY. 1, 5.</ls> has only two numbers.
    But it is on page 1-0956, so Author must have intended this to change also,
    otherwise he would not have put "Sp. 956–1016" in the VN.
12106 {#upaSaya/#}  1-0974

gasyoun commented 1 month ago

@funderburkjim I contintue to upload my new scans of Sanskrit dictionaries, do not know if better than what you have or not https://vk.com/samskrtamru?w=wall-88831040_22648

funderburkjim commented 1 month ago

vn missing installed

Work files are here.

vntxt_4.txt contains the new entries.

These entries have been inserted into pwg.txt, and csl-orig updated. Various small adjustments made to the display programs (see commit links above).

I think the goals of this issue have been satisfied. Request @Andhrabharati to review.

Next step for me: changes to pwg.txt that were noticed during this missing VN work. Will detail these proposed changes after AB review of this vn work .

sanskrit-lexicon / PWG