Open funderburkjim opened 2 months ago
https://sanskrit-lexicon.uni-koeln.de/pwgindex.html alleges to be a scanned edition of PWG,
created long ago from material from @maltenth.
readme_cdsl_vn.txt is a sort of index to the VN portions from the various volumes.
Compare the volume 1 VN material from this cdsl source to the volume 1 VN material supplied by @Andhrabharati in #37, #39 : AB vol 1 vn pdf.
They are very different.
How to account for this difference? If we want to improve the VN coding in cdsl pwg.txt, which of the two sources should be used?
Although AB does not divulge the exact source of his pdfs, perhaps he could retrieve some information from the title pages that would explain the difference.
@maltenth has to respond about the scans at CDSL!
And it is my lookout to reach to the best possible original sources (either scans or physical books) that enhances my collection, as a conitinuous process. [I find that many works cited in PWG could be traced at the Bavarian library (having excellent quality). Of course, there are quite many other sources as well.]
So far as I am concerned, the text in the pwgheader (of older date) is exactly what is in the print volumes; and I just had proofed the same (and at times split some matter into separate lines) and posted earlier.
Jim could probably start with converting the PWG-VN data in Thomas's original format to current CDSL format.
https://sanskrit-lexicon.uni-koeln.de/pwgindex.html alleges to be a scanned edition of PWG, created long ago from material from @maltenth.
This was 2002 or 2004, I received them on a CD from Germany in Moscow.
This post indicates some of my PWG "sources".
readme_cdsl_vn.txt is a sort of index to the VN portions from the various volumes. ... ... vol. 1 (no VN matter)
If the Vol.1 does not contain any VN matter, how would Jim (and/or Thomas) explain the presence of the typed matter from those pages in the pwgheader file (that was received from Thomas)?
@maltenth has to respond about the scans at CDSL!
Unfortunately, communication by me with Thomas has become unpredictable this year.
If the Vol.1 does not contain any VN matter ...
Just now, I compared a. the volume 2 material from pwgindex to b. pwgheader/PWG.V.2.VN.pages.pdf
They are identical. From this I infer that the material in pwgheader/PWG.V.1.VN.pages.pdf is simply absent from the pwgindex images. Why absent -- no way to know. This allays my concern regarding possible version difference.
I have revised pwgindex program to include
Based on recent review (see revised readme_cdsl_vn.txt), the first task is to provide 'entries' for the 'missing' VN material. The sources for this missing material has two parts:
[v.pppp]
lines to the metaline-body-lend format of pwg.txt entries. Let's refer to this file by the shorter name VNTXT.The VNTXT file has 599 lines 'to convert'. 131 of these are, following the printed text, without headwords. The first examples:
[1.0012] ¦ streiche das Beispiel u. अक्न ...` <<< headword is 'akna'
[1.0014] ¦ Z. 31 streiche <ls>ṚV.</ls> <8,46,26>. <<< headword is 'akza'
AB: Have you already determined these missing headwords?
AB: Have you already determined these missing headwords?
No, Jim; you may refer to my related post. But, it is a fairly simple task if decided to be done!
If I am to do it, I might wish to re-look at the whole content for a possible 'revision'!!
- The AV reference improvements on page 3 of PWG.V.1.VN.pages.pdf. AB's task is to type this in some straightforward way.
It may be noted that the form <X> = <Y>
is to be considered something like lies <X> st. <Y>
at these AV citation changes. As such, I don't suggest changing this format.
@funderburkjim
I think I have now properly changed these PWGVN lines, to the format as in the pwkvn pages. PWGVN_1-6reformatted(dng).txt
There are couple of places (the lines having ...do...
, ???
and ;;
) that you may need to look at first.
-----------------------------
PS. I feel the VN lines of PWG-5 (lines 553-573 in my file) could be discarded, as the page is not to be seen in the original Bavarian Library and the re-printed Japanese (MLBD) ed. copies.
...do... denotes that the VN line belongs to the same HW as above! Or in other words, those HWs contain two or more corrections.
And you may note that the [v.pppp]
after the broken bar denotes the actual correction location, not the pc-field for the metaline (which should be built with the previous [Page:VNv-ppp]
).
Note: ...do... ¦ [1.0014] Z. 31 streiche <ls>ṚV. 8,46,26.</ls>
actually refers to 'अक्ष, not to
अक्ष्` (the HW above). This is the only one I've checked.
yes; in fact it should be referring to <hom>2.</hom> अक्ष
.
Probably these should be checked again all over for the homonyms and accent marks (which I had missed at some places), after you prepare the file.
PS. I feel the VN lines of PWG-5 (lines 553-573 in my file) could be discarded, as the page is not to be seen in the original Bavarian Library and the re-printed Japanese (MLBD) ed. copies.
The "actual" reason I had in mind is not about the VN part in the Cologne-scan on Sp.1677-8 (which is present in PWG7 as well), but that many entries in the Bavarian copy do not "appear" anywhere else, incl. the CDSL text.
Bavarian Library copy scan page--
CDSL scan page--
The transcoding to slp1 (from vntext_0_deva.txt to vntxt_0.txt) required a few edits of vntext_0_deva. See change_vntxt_0_deva.txt.
Correct pwg-devanagari accents that were missed in vntxt_0.txt.
lines 553-573 of AB file
From an examination of these 21 headwords with current PWG display:
I see no problem (and some minor benefit) in KEEPING lines 553-573, since this material corresponds to the scan Thomas made for cdsl.
It is mysterious that the Bavarian edition (per scan above)
BTW: it is good that you have not only filled in headwords, but also added page-references for corrections.
BTW: it is good that you have not only filled in headwords, but also added page-references for corrections.
long live @Andhrabharati
[from Jim's file: pwgissues/issue76/readme.txt]
# transcode
cd /c/xampp/htdocs/sanskrit-lexicon/PWG/pwgissues/issue76/transcode mkdir pwgtranscoder1 cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/transcoder1/deva_slp1.xml pwgtranscoder1/deva_slp1.xml cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/transcoder1/slp1_deva.xml pwgtranscoder1/slp1_deva.xmlcp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/transcoder.py . cp /c/xampp/htdocs/sanskrit-lexicon/MWS/mwtranscode/mw_transcode.py pwg_transcode.py
# heavily edit pwg_transcode.py
It is quite surprising to see that Jim has copied MW's transcoder files to "handle" the PWG transcoding, and had to "heavily edit" the same for the purpose!!
Probably (a) MW is fully overshadowing Jim's thoughts, or (b) Jim is also now entering into "dotage" as Thomas, who himself said thus in response to one of my points earlier.
Jim has a separate "transcoder file-set" for the PWG family from the very initial days (which he had updated for the devanagari accent, upon some prolonged debating with me); and the same should've been used here.
Otherwise, it leads to unnecessary contamination of MW-style and PWG-style of accents, as can be seen from the below snippets from the PWG print and Jim's current revision--
[from Jim's file: change_vntxt_0_deva.txt]
[from AB's file: PWGVN_1-6reformatted(dng).txt]
{#रााण꣫#} ¦ [6.0317] (auf Bogen 21*) Z. 1; in {#राणि#} und {#पैलादि#} ist der Haken über dem {#ि#} abgebrochen. ;; Jim, this is a case of non-invertibility of Devanagari-slp1-devanagari!!
The transcoding to slp1 (from vntext_0_deva.txt to vntxt_0.txt) required a few edits of vntext_0_deva.
[from Jim's file: change_vntxt_0_deva.txt]
old: {#रााण꣫#} ¦ [6.0317] (auf Bogen 21) ... new: PWG style udAtta -> MW style udAtta, also hiatus ; the cdsl spelling headword in rARa/ = राण॑ {#राण॑#} ¦ [6.0317] (auf Bogen 21)
---
old: {#राण॑#} ¦ [6.0317] (auf Bogen 21) Z. 1; in {#राणि#} und {#पैलादि#} ist der Haken über dem {#ि#} abgebrochen. new: Replace DEVANAGARI VOWEL SIGN I with DEVANAGARI LETTER I {#राण॑#} ¦ [6.0317] (auf Bogen 21) Z. 1; in {#राणि#} und {#पैलादि#} ist der Haken über dem {#इ#} abgebrochen. ; Jim doesn't know how to represent in slp1 the 'naked' vowel sign. ; the hook above the {#ि#} is broken
Incidentally, I had discussed about this very item with @drdhaval2785 in private mail exactly 3 years back!
Here is my initial mail to Dhaval--
followed by further responses--
; Jim doesn't know how to represent in slp1 the 'naked' vowel sign.
It's because Jim is following the slp1 from Peter Schraf, who had duly made a note of this particular point in "his study/survey" (before coming up with slp1)--
but for some reason, did not even "try" to propose any solution!
So it is not just slp1 alone that doesn't handle this, but also (any and) every existing Roman transliteration scheme!
If Jim is "willing" to "update" the CDSL transcoding rules (as he had done in quite many cases till now), I shall post my proposal to handle the same (with which the invertibility condition also gets satisfied).
Probably, Jim might wish to get Peter Schraf's opinion also about that proposal (before taking any action on it).
I see no problem (and some minor benefit) in KEEPING lines 553-573, since this material corresponds to the scan Thomas made for cdsl.
It is mysterious that the Bavarian edition (per scan above)
* doesn't have the material at the bottom of the corresponding page of cdsl scan * Is different in the top half also. e.g. There is a legitimate correction to moGa in Bavarian edition, which I don't find in the cdsl scan.
In fact, I would consider it to be exactly opposite that the CDSL scan is THE mystery case!
As I had already indicated earlier, both the Bavarian Library scan (1868) and the Japanese reprint (1976) tally exactly with each other, so does any physical book that I had seen in various Indian libraries (or in market now for sale).
Now I have found a scan copy digitised by Google [from the Sapienza University of Rome (Biblioteca di Studi Orientali)] in August 2013, which has both the "proper ending page" of Bavarian copy followed by the "extraneous page" of the CDSL scan (after a blank page).
This is somewhat similar to what we had seen earlier in one of the MW99 scans having two of MD errata pages, about which some discussion has took place, and finally it was concluded that it was an error in binding that particular copy and those two pages were NOT brought into the MW annexure data.
It is surprising that the CDSL scan copy has the "original" ending page (as in all the three above scan copies) MISSING and is left only with the dubious "extraneous" page.
MW is fully overshadowing Jim's thoughts
No, The reason I used the mw transcoders was that I had available the inverse transcoder deva_slp1.xml but did not have deva1_slp1.xml.
I'm constructing deva1_slp1.xml now.
The inverse transcoder file deva1_slp1.xml now created. I should have done that in the first place. This used to genereate the slp1 version of AB's file: vntxt_1_rev.txt.
Jim thinks that vntxt_1_rev.txt is ready for further use.
"update" the CDSL transcoding rules
I'm curious what such an update would look like. Let's see the proposed transcoder file.
or Jim is also now entering into "dotage" :
First things first!
Against Jim's two posts 1 and 2 just above this, I would like to re-iterate from AB's post:
Jim has a separate "transcoder file-set" for the PWG family from the very initial days (which he had updated for the devanagari accent, upon some prolonged debating with me); and the same should've been used here.
Here are the transcoders I have with me (as recd from Jim)--
[MW-version, which has no "deva1_slp1.xml" indeed]
[pw-version, which DOES have the "deva1_slp1.xml"]
And he had clearly said those days that the deva1 <> slp1 files were specifically made for the pw-family!! He had also indicated how to check the intertibility using the "to & fro transcoders" one after the other.
I can as well show (point) him where he has posted these transcoders (for me) earlier, if he is still not convinced that these were already existing before!
------------------
PS. Sorry Jim, I didn't use the "dotage" term in any derogatory sense; it was just indicating the state-of-the-mind (forgetfulness) sometimes seen in younger guys as well.
As I has mentioned in my mail to Dhaval (in the above post), a need to transcoding the vowel-marker (mAtrA) characters arises not only in case of grammar books, as in
[Macdonnell]
or [Monier Williams]
or in reference works, as in [Monier Williams dictionary]
[Unicode Chart: Devanagari]
or in posters, as at [Marcis's post](https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-846456420)
Of course, for most of such works that go to actual publishing, other 'professional means' would be resorted to (and not these Roman transcoding schemes) for the intended text matter appropriately!!
[... post continues further below ...]
@Andhrabharati I can hardly imagine a case other than textbook for having the need to seperate the vowel representation.
but esp. in the cases of "truthfully" showing/indicating [in plain text format] the mistakes or wrong readings (or prints), as at--
[PWG6-0317] ;; which became 6-0333 after correction
[PWGVN 6-001]
[PWG3-0271]
[PWGVN3-001]
Here these Devanagari strings are deliberately typed thus in the text matter, and are NOT at all typos as Jim has commented and "changed" them to the 'corrected' forms--
The transcoding to slp1 (from vntext_0_deva.txt to vntxt_0.txt) required a few edits of vntext_0_deva.
and
Now is the time for my proposal to transcode these--
I would like to propose using the ¬ ["Not sign"] character (alt+0172; u+00ac) for denoting the following 'vowel-mātrā' character as a 'Not-vowel' character!
The Unicode std. prescribes ◌ ["Dotted circle"] (u+25cc) character to be used as a place-holder, and showed it in positioning the diacritic-marks (which I am now extending to positioning the vowel-markers as well).
Namely, the proposal goes like this--
Note: Devanagari transcoding would not be with the dotted circle (the uniscribe engine would take care of rendering the appropriate script character), but the Roman transcoding should be having dotted circle prior to the resp. Roman letter.
With this notation, we would get the round-robbin strings properly--
@Andhrabharati After your comment, I was able to find a deva1_slp1.xml from 2023. In conversion of your file to slp1, the current version shows one improvement. *So the preferred version is deva1_slp1.xml. And this version is now also available in csl-websanlexicon and csl-apidev (which are the cdsl 'official' locations for the transcoders.
Some details on comparison to the 2023 version are in readme_deva1.txt.
deliberately typed thus in the text matter, and are NOT at all typos
agree @Andhrabharati
??? ¦ [1.0956] — [1.1016]
AB file comment states:
20 cases— 12121, 12122, 12145, 12196 (2), 12217, 12247, 12282, 12291, 12350, 12352,
12369, 12448, 12457, 12470, 12513, 12561, 12593, 12602 and 12691
But there are only 19 L-numbers listed. Does the (2)
have some significance that
yields 20 cases
?
I plan to generate a VN entry for each of these 19 (or 20) .
Also, I think there are two more at the beginning of the list
11847 {#upanayana#} 1-0956
Here the reference <ls>ŚĀṄKH. GṚHY. 1, 5.</ls> has only two numbers.
But it is on page 1-0956, so Author must have intended this to change also,
otherwise he would not have put "Sp. 956–1016" in the VN.
12106 {#upaSaya/#} 1-0974
@funderburkjim I contintue to upload my new scans of Sanskrit dictionaries, do not know if better than what you have or not https://vk.com/samskrtamru?w=wall-88831040_22648
Work files are here.
vntxt_4.txt contains the new entries.
These entries have been inserted into pwg.txt, and csl-orig updated. Various small adjustments made to the display programs (see commit links above).
I think the goals of this issue have been satisfied. Request @Andhrabharati to review.
Next step for me: changes to pwg.txt that were noticed during this missing VN work. Will detail these proposed changes after AB review of this vn work .
Continue the discussion of VN (additions and improvements) for PWG, that was begun in #39.