PWKVN - Githubissues

funderburkjim commented 2 years ago

This documents the digitization of the Nächtrage und Verbesserungen sections of PWK. The digitization was recently prepared by Thomas Malten and his typists in India. In turn much of the 'typical' markup was added by me, and the result prepared as a 'NEW DICTIONARY'. Currently, there is no 'application' of the additions and corrections to the PW dictionary itself. A specialized display allows one to investigate pwkvn along with Schmidt (sch) dictionary and pw dictionary.

funderburkjim commented 2 years ago

preparation

The derivation of the current digitization proceeded in many steps, which are in the pwkvn folder. There are numerous (28) forms present in the pwkvn/orig folder; the readme file therein briefly describes the work done, starting with the digitization prepared by @thomasincambodia. The work in this folder is preparatory, and is included for possible reference. The production form of the digitization is in the csl-orig repository

digitization

The base form of the dictionary is pwkvn.txt in csl-orig/v02/pwkvn.

Several derivative files and forms are constructed by program. These are all constructed by the redo.sh script in the csl-orig/v02/pwkvn/update/ folder:

pwkvn_hwextra.txt Many 'entries' in pwkvn were identified as having one or more alternate headwords; these are identified in the digitization by the markup <althws>{#X, Y, ...#}</althws>.
update/pwkvn_hk.txt utf-8 encoding, and retaining most of Thomas's original coding conventions, e.g.
- devanagari in hk transliteration, with particular convention for accents.
- Letter-number convention (AS) for letters with diacritics
- line-break indicated by superscript 2.
update/pwkvn_hk_ansi.txt Same as pwkvn_hk.txt, but in cp1252 encoding used by Thomas for compatibility with Kedit editor.
update/pwkvn_deva.txt Devanagri text in Unicode Devanagari (for @Andhrabharati , @drdhaval2785 and others.

Most of these files are to big to view in browser, but may be downloaded individually from Github.

funderburkjim commented 2 years ago

Displays

the B L A M displays are functioning; links found at https://sanskrit-lexicon.uni-koeln.de/scans/PWKVNScan/2020/web/index.php
A special display has been prepared that shows PWKVN along with SCH and PW dictionaries. https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/
- This is technically interesting, because it builds on web-component technology experimentation of two years ago (see specifically lit-getword05a using LitElement (which I think is now a Google project at https://lit.dev/.
- this link now provides a selection of various display versions. After a review period for user comments, links will be made in the homepage https://www.sanskrit-lexicon.uni-koeln.de/.

Andhrabharati commented 2 years ago

A major chunk of work done at last, though some more related work remains on this.

https://github.com/sanskrit-lexicon/PWK/issues/77#issuecomment-1046161569

I can post the details, if Jim is interested in this continuation part.

funderburkjim commented 2 years ago

@Andhrabharati Am interested to see your comments. Please go ahead and post the details.

maltenth commented 2 years ago

@funderburkjim, I think you said you have a newer version of sch in ansi format. Please let me have it.

funderburkjim commented 2 years ago

No 'ansi' version exists of the current digitization of sch. The current version (non-ansi) may be downloaded from https://github.com/sanskrit-lexicon/csl-orig/blob/master/v02/sch/sch.txt.

Creating an 'ansi version' seems non-trivial; I'll let you know if it becomes available.

gasyoun commented 2 years ago

Currently, there is no 'application' of the additions and corrections to the PW dictionary itself.

Is there a plan for it?

Andhrabharati commented 2 years ago

Guess it should go to the pw.txt appended at the end, with continuing L-numbers, if not at the end of each volume (resp. portions) as in pwg.txt.

gasyoun commented 2 years ago

pw.txt appended at the end, with continuing L-numbers

@funderburkjim agree?

Abbreviations from Nachtrage are not recognized in Schmidt.

166120869-dc2dc616-d303-45b0-b367-79ffc98da44a

funderburkjim commented 2 years ago

sch.txt does not have ls markup. That's the reason for Apast. Sr.

@thomasincambodia has requested that the above display also include PWG, which I plan to try. The display also needs an 'info.html' file to explain what's going on.

funderburkjim commented 2 years ago

pw.txt appended at the end, with continuing L-numbers

That's one possibility. Low ranking on things to do.

Currently, there is no 'application' of the additions and corrections to the PW dictionary itself. Is there a plan for it?

A can of worms I'm leary of opening. Greater interest in continuing the improvement of ls markup in PW.

maltenth commented 2 years ago

@funderburkjim can you use the pwkvn_hk_ansi.txt which you sent me and which I have been working on to improve the markup?

funderburkjim commented 2 years ago

Yes - I can convert pwkvn_hk_ansi.txt back to pwkvn.txt

It will be problematic if you change the number of lines in the file, add new markup, etc.

Best to send me a version before you do a lot, so I can see what 'improve the markup' involves.

Andhrabharati commented 2 years ago

@thomasincambodia has requested that the above display also include PWG, which I plan to try.

I wish I could see the form as PWG | PWG_VN | pwk | pwk_VN & SCH, or in other words, (1) PWG main text, as first column, (2) PWG VN data (sequentially volume-wise, as present in the print), if available, as second column (3) pwk text, as third column (4) pwk_VN data (sequentially volume-wise, as present in the print), if available, followed by SCH data, as fourth column.

[Presently the PWG_VN is shown beneath the main text in PWG; SCH is shown above & the pwk_VN below (non-chronological) in PWKVN.]

Andhrabharati commented 2 years ago

Having the VN (Annexure/Corrections) besides the main data (instead of beneath) makes it more visible; and having PWG & pwk side-by-side shows how the Petersburger lexicons evolved over time.

Andhrabharati commented 2 years ago

Having the VN (Annexure/Corrections) besides the main data (instead of beneath) makes it more visible

And how about extending this to all cdsl works (barring MW99, which got both of them integrated; of course some work is still pending in it!!)?

Andhrabharati commented 2 years ago

@Andhrabharati Am interested to see your comments. Please go ahead and post the details.

@funderburkjim

The very first point: Accents--

The accents in the pwkvn are to be rendered just as in PWG and pwk texts; recall the prolonged discussion and exercise done during last year about the devanagari accent marking specifically applicable to PWG and pwk.

Once this is done, I would start listing the other points. [I feel, listing all points at once will not attract your attention.]

Andhrabharati commented 2 years ago

The present display at https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/ does not show the accents at all, and I see no option to enable the same.

And thus, one misses the chance to see that many accents in pwkvn are lost (or skipped) in SCH.

Andhrabharati commented 2 years ago

These accents might be of no interest/significance to Jim (as he had mentioned sometime ago elsewhere), but they would be required by the 'real serious users' of the Petersburger lexicons.

gasyoun commented 2 years ago

many accents in pwkvn are lost (or skipped) in SCH.

Oh, interesting @Andhrabharati

gasyoun commented 2 years ago

The display also needs an 'info.html' file to explain what's going on.

Would be lovely, as a lot of work has been done, which I'm hardly aware of.

@thomasincambodia has requested that the above display also include PWG, which I plan to try.

PWG, but not the PWG Nachtrage?

Greater interest in continuing the improvement of ls markup in PW.

You made me smile.

sch.txt does not have ls markup. That's the reason for Apast. Sr.

Can it have same as PWK at least for now?

maltenth commented 2 years ago

@gasyoun

I think the PWG Nachträge are completely absorbed in PWK (needs to be checked), so adding them may be only of historical interest. With the the complete digitization of PWKVN 1 to 7 (+8 Last additions pp. 384-390) in hand at last, -- thanks to Jim's generosity --, even SCH is, at this point, only of relevance to PWKVN in that it sometimes silently corrects printing errors in PWKVN. So it is a moot question whether the work focus should be on SCH.

@Andhrabharati

above you write: "These accents might be of no interest/significance to Jim (as he had mentioned sometime ago elsewhere), but they would be required by the 'the real serious users' [emphasis yours] of the Petersburger lexicons."

Please be more specific, and give the actual citation of what Jim writes about accents; let me know whom you consider as "real serious users" and also whom you consider as not real serious users

above you write:

I feel, listing all points at once will not attract your attention. [emphasis yours]

Please clarify. Why do you feel that? Like Jim might be overwhelmed by the complexity of your list?

above you write:

And thus, one misses the chance to see that many accents in pwkvn are lost (or skipped) in SCH.

That is certainly true, and a study of this and other errors in Schmidt's handling of PWKVN can certainly be profitably made.
Note that (roughly) only half the entries in SCH refer to PWK.

Andhrabharati commented 2 years ago

@gasyoun & @thomasincambodia

PWG, but not the PWG Nachtrage?

I think the PWG Nachträge are completely absorbed in PWK (needs to be checked), so adding them may be only of historical interest.

Even otherwise, the PWG Nachtrage data is already there in the cologne PWG file (originally- https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-848601953), and is being shown beneath the main text (as applicable, for some portion- if not of all the volumes). It is only the pwk_VN data that was skipped during pwk digitization, and is done now; hence the debate on how to show/use the same. '------------------- @thomasincambodia

"These accents might be of no interest/significance to Jim (as he had mentioned sometime ago elsewhere), but they would be required by the 'the real serious users' [emphasis yours] of the Petersburger lexicons."

Please be more specific, and give the actual citation of what Jim writes about accents; let me know whom you consider as "real serious users"

This is what Jim said at https://github.com/sanskrit-lexicon/PWG/issues/5#issuecomment-894531698

But, speaking personally, Devanagari accents have basically no utility to me. If I still have not mastered the vocabulary of even Hitopadesha, why worry about Devanagari or Vedic accents?

But I realize others may not view accents this way, and am willing to make changes to the display details to accomodate other views. I would like others to come to a consensus before proceeding with technical changes to slp1_deva.xml or elsewhere.

Probably Thomas could spend a little time, going through the chain of posts/discussions from Aug '21 to Oct '21- starting at https://github.com/sanskrit-lexicon/PWG/issues/5#issuecomment-891852841 to the end of the issue.

I consider myself as a serious user and guess there would at least be some more across the globe; and I do not want to speak on the other category users.

I am allured to PWG for some unknown reason (probably, it being in a different language than all others that I had dealt with so far; and also as it has many citations and has been the "base" for almost all the later Sanskrit dictionaries). https://github.com/sanskrit-lexicon/AP90/issues/17#issuecomment-851552308 https://github.com/sanskrit-lexicon/AP90/issues/17#issuecomment-861100004

Andhrabharati commented 2 years ago

@thomasincambodia

I feel, listing all points at once will not attract your attention. [emphasis yours]

Please clarify. Why do you feel that? Like Jim might be overwhelmed by the complexity of your list?

Yes, Jim himself has mentioned thus sometime back, @thomasincambodia !

He seems to have accustomed to see one point at one issue heading, and my way of posting a chain of points without gap/break seems to have made him 'skip' (most, if not all, of) them.

Of course, he had made separate issues (just about 2-3 points so far) out of my bunch of points, but that's a rarity.

Andhrabharati commented 2 years ago

And thus, one misses the chance to see that many accents in pwkvn are lost (or skipped) in SCH.

That is certainly true, and a study of this and other errors in Schmidt's handling of PWKVN can certainly be profitably made. Note that (roughly) only half the entries in SCH refer to PWK.

I had done a reasonably good amount of study of pwk_VN pages and the SCH, and made some notes to myself in last December itself, before suggesting to go for full typing of pwkVN pages all over, instead of trying to derive them from SCH matter. https://github.com/sanskrit-lexicon/PWK/issues/75#issuecomment-1003512836 [Though I initially had an intention to post my observations at that time, some later happenings have changed my mind.]

gasyoun commented 2 years ago

[Though I initially had an intention to post my observations at that time, some later happenings have changed my mind.]

Are you ready now?

funderburkjim commented 2 years ago

Accents in the pwkvn display.

The display has been altered so that accents are shown. As with PW and PWG, the PWKVN display for Devanagari uses

slp1_deva1.xml for transcoding
siddhanta1 font.
so the udAtta accent shows as a superscript Devanagari 'u'.
- Currently, there is no way to view the display without accents.

funderburkjim commented 2 years ago

A sample word with accents is dvimUrDan (slp1)

drdhaval2785 commented 2 years ago

In Advanced view, I could see without accents. No issues at all.

funderburkjim commented 2 years ago

It's the https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/ display that does not currently have a control for accent/noaccent.

drdhaval2785 commented 2 years ago

Ok. Thanks for clarification.

funderburkjim commented 2 years ago

Accent control added. to csl-apidev/pwkvn' display. Initial value is 'show accent', but may be changed to 'hide accent'.

Andhrabharati commented 2 years ago

Second point- Text portion

The text portion after the meta-line in all other dictionaries is-- HW¦ Body, including in the recently added ARMH. But in this work, the broken bar is missing. Though it is not a major issue to worry about, if and when the data is clubbed with the pwk main text, it would definitely look odd. Also, this forms a base for the next point.

Third point: AltHWs 3a. Out of the 387 listed <althws> in the first 6 volumes, most (if not all) of them are NOT alt. HWs, but just the HW and its body content (as addition or as revision).

3b. Out of the 1212 listed <althws> in the 7th volume, vast majority of them are NOT alt. HWs, but just sequential entries in the earlier (1-6) volumes. This portion in the 7th volume being mainly intended as the index for the previous volumes' VN entries, such sequential entries of a volume are clubbed together to save the print space (pages).

Hope these example screens make the point clear enough.

I suggest that necessary action may be taken to correct the above, and retain just the actual alt. HWs. As this involves deleting some lines in the first 6 volumes portion and adding some lines/entries (by appropriately splitting) in the volume 7 part, leaving this to Jim to do (he wants the line count of a submitted file to be matching as a foremost requisite; so my help may not be entertained).

Incidentally, the vol.7 index entries have some corrections (either in accent or in spelling) in the index words as against the actual entries of the first 6 volumes. So we cannot ignore those index entries altogether.

Though not so important, just like to mention that few earlier volume VN entries are missed in the vol.7 index; could it be an error or intentional by Boethlingk?

Andhrabharati commented 2 years ago

However I can prepare a list, to save Jim's time in going through all these entries to identify which to retain and which to change (if he is convinced).

gasyoun commented 2 years ago

most (if not all) of them are NOT alt. HWs, but just the HW and its body content (as addition or as revision).

Good point indeed.

However I can prepare a list, to save Jim's time in going through all these entries to identify which to retain and which to change (if he is convinced).

Is @funderburkjim convinced?

Hope these example screens make the point clear enough.

Indeed @Andhrabharati, thanks.

Incidentally, the vol.7 index entries have some corrections (either in accent or in spelling) in the index words as against the actual entries of the first 6 volumes. So we cannot ignore those index entries altogether.

Oh no, it's where one can go mad ))

Though not so important, just like to mention that few earlier volume VN entries are missed in the vol.7 index; could it be an error or intentional by Boethlingk?

intentional by Boethlingk - do not think so. He wrote about Knauer who compared PWK and PWG and found some missing entries, so he was not aware of the loss before Knauer's findings himself.

Andhrabharati commented 2 years ago

Incidentally, the vol.7 index entries have some corrections (either in accent or in spelling) in the index words as against the actual entries of the first 6 volumes. So we cannot ignore those index entries altogether.

Oh no, it's where one can go mad ))

Why so? Recall the same condition identified in MW99 annexure, and subsequent integration work done in Jan-Feb 2021. Still Jim and I are quite sane, not gone mad!!

Andhrabharati commented 2 years ago

He wrote about Knauer who compared PWK and PWG and found some missing entries, so he was not aware of the loss before Knauer's findings himself.

You still owe me giving the Boehtlingk's letters, @gasyoun ! are the two volumes not scanned still?

Andhrabharati commented 2 years ago

Knauer who compared PWK and PWG and found some missing entries,

With the display that I was suggesting earlier above, https://github.com/sanskrit-lexicon/PWK/issues/86#issuecomment-1114465736, any and everyone can clearly see such differences between PWG and pwk.

funderburkjim commented 2 years ago

volume 7 hw lists and `<althws>`

I introduced the althws markup in pwkvn.txt as a way to deal with the headword lists in volume 7.

In the first entry of volume 7, in pwkvn.txt we have

<L>9410<pc>7-289-a<k1>a<k2>a
<althws>{#aMSa#}</althws>
<hom>2.</hom> <hw>{#a#}</hw> und <hw>{#aMSa#}</hw> I. 1. 
<LEND>

This entry needs to be accessible to displays either under 'a' or 'aMSa' (slp1).

The althws tag causes a 'duplicate' record to be made in pwkvn.xml (with L=9410.1).

<H1><h><key1>a</key1><key2>a</key2></h><body> 
<hom>2.</hom> <s>a</s> und <s>aMSa</s> I. 1. </body>
<tail><L>9410</L><pc>7-289-a</pc></tail></H1>
<H1><h><key1>aMSa</key1><key2>aMSa</key2></h><body>
<alt><s>aMSa</s> is an alternate of <s>a</s>.</alt>  <hom>2.</hom> <s>a</s> und <s>aMSa</s> I. 1. </body>
<tail><L>9410.1</L><pc>7-289-a</pc><hwtype n="alt" ref="9410"/></tail></H1>

Since the displays are built on pwkvn.xml, the 9410.1 record shows the entry under a search for 'aMSa'. pwkvn.xml, I think this althws idea is a reasonable way to handle the headword lists in volume 7. The only general change that might be needed is in the phrase

Y is an alternate of X

Perhaps this should be either changed or omitted entirely.

volume 1-6 althws

Before adding the <althws> markup, I added the <hw>X</hw> markup in pwkvn.txt.
This was done on the basis of patterns. the simplest being {#X#} und {#Y#} which was marked as <hw>{#X#}</hw> und <hw>{#Y#}</hw>. For example in volume 1:

<L>115<pc>1-283-a<k1>agfhapati<k2>agfhapati
<althws>{#agfhapatika#}</althws>
<hw>{#*agfhapati#}</hw> und <hw>{#*°ka#}</hw> <is>gaṇa</is> {#cArvAdi#}. 
<LEND>

In this case, I thought the entry should be accessible by either agfhapati or agfhapatika, hence the althws markup.

But, I used several other patterns, and perhaps some of these should be considered wrongly marked. For example, maybe the very first entry should be considered as having only one headword <hw>a</hw> and no althws.

suggestion

@Andhrabharati

Why don't you first prepare a file with the entries that should have NO althws at all (i.e. there is an althws markup but there should be only one <hw>X</hw> which is the first one. The only thing needed in this file is the L-numbers (for instance, L=2). Then I can easily remove the althws markup (and the extra hw-tags for these cases from pwkvn.txt.

gasyoun commented 2 years ago

You still owe me giving the Boehtlingk's letters, @gasyoun ! are the two volumes not scanned still?

One volume, but soo big. The compiler died a year ago. I'll make paper one day and maybe will scan it for that.

funderburkjim commented 2 years ago

sch.txt does not have ls markup. That's the reason for Apast. Sr. Can it have same as PWK at least for now?

@gasyoun Someone needs to

add the basic ls markup in sch.txt
create the corresponding 'schauth/tooltip.txt' file (such as based on the front-matter list given for sch).
- format could be like that for benfey auth tooltips

funderburkjim commented 2 years ago

I wish I could see the form as PWG | PWG_VN ...

Currently, the pwg digitization contains both the main text and the VN material.
The VN material seems to be in two sections:

end of volume 5: VERBESSERUNGEN UND NACHTRÄGE ZU THEIL I-V.
- extends from line 617185 of pwg.txt to line 737375 of pwg.txt
- From <L>62404<pc>5-0941<k1>a<k2>a<h>3 to <L>80800<pc>5-1678<k1>mluc<k2>mluc
end of volume 7: Verbesserungen und Nachträge zum ganzen Werke.
- extends from line 1122155 through line 1149413 of pwg.txt
- From <L>117929<pc>7-1685<k1>a<k2>a<h>3 to <L>122732<pc>7-1822<k1>hevAkin<k2>hevAkin

To do what you suggest, we would need at least to make a separate 'pwgvn' digitization (by extracting the two VN sections from pwg), and then doing the necessary things to have a display for a new 'pwgvn' dictionary. An interesting idea, but I don't want to commit to doing it now. Will focus first on adding the current pwg to display.

Andhrabharati commented 2 years ago

The VN material seems to be in two sections:

* end of volume 5:  VERBESSERUNGEN UND NACHTRÄGE ZU THEIL I-V.

  * extends from line 617185 of pwg.txt to line 737375 of pwg.txt
  * From `<L>62404<pc>5-0941<k1>a<k2>a<h>3` to `<L>80800<pc>5-1678<k1>mluc<k2>mluc`

* end of volume 7: Verbesserungen und Nachträge zum ganzen Werke.

  * extends from line 1122155 through line 1149413 of pwg.txt
  * From  `<L>117929<pc>7-1685<k1>a<k2>a<h>3`  to `<L>122732<pc>7-1822<k1>hevAkin<k2>hevAkin`

Pl. recall the discussion at https://github.com/sanskrit-lexicon/PWG/issues/39; the material belonging to Vol.1-4 and Vol.6 does not always get reflect in the VN matter of Vol.5 or Vol.7; many cases where those entries are seen to be repeated, they are further 'revised' in Vol.5/7. [You also have stored my posted file (for future use ?)-- https://github.com/sanskrit-lexicon/PWG/issues/39#issuecomment-887932380]

As such, I propose that you should include all the VN material from all the volumes, as is done in case of pwk.

funderburkjim commented 2 years ago

In pwg.txt, I don't see any entries that should be considered as VN entries except those mentioned above.

Andhrabharati commented 2 years ago

Pl. see these posts-

https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-848601953

https://github.com/sanskrit-lexicon/PWG/issues/37#issuecomment-848607641

https://github.com/sanskrit-lexicon/PWG/issues/39#issuecomment-887682470

Andhrabharati commented 2 years ago

@Andhrabharati

Why don't you first prepare a file with the entries that should have NO althws at all (i.e. there is an althws markup but there should be only one <hw>X</hw> which is the first one. The only thing needed in this file is the L-numbers (for instance, L=2).

As I started making the list, seen that the very first entry is marked as<L>2, instead of <L>1. On further looking, noticed that at every <H> there is a jump in <L> number; in total 9 of them are skipped--

<L>1
<L>1769
<L>3238
<L>4954
<L>5974
<L>8175
<L>9408
<L>9409
<L>22069

Andhrabharati commented 2 years ago

Looked at Part-1 (Vol. 1-6) of the pwkvn and made the list, taking the pwk main data as the reference to decide the alt. HWs. non-althws entries (Vol. 1-6).txt

[As the count is not even 40% (of 387), my estimation of "most (if not all)" is clearly wrong, but the count is quite large indeed.]

If this is found suitable, shall look into the Vol. 7 data (Part-2) next.

Andhrabharati commented 2 years ago

To do what you suggest, we would need at least to make a separate 'pwgvn' digitization (by extracting the two VN sections from pwg), and then doing the necessary things to have a display for a new 'pwgvn' dictionary. An interesting idea, but I don't want to commit to doing it now. Will focus first on adding the current pwg to display.

JIC you happen to change your mind @funderburkjim , here are the split portions of pwg.txt (latest from the csl-orig repo)-- pwg_main.zip pwg_VN.zip

And you might consider making the present format text of the Vol. 1-4 & 6 VN data, from the first 1057 lines from pwg_orig.txt in the https://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/downloads/pwgtxt.zip, and include either to the pwg.txt as is, or to the split portion of pwg_VN txt, to use it in displaying the PWM text.

funderburkjim commented 2 years ago

experimental versions

There are now 2 versions of the experimental display which include pwg. The link https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/ provides further links to the variants.

Comments solicited.

(plan to soon revise the list of 'althws')

gasyoun commented 2 years ago

There are now 2 versions of the experimental display which include pwg.

Thanks, only out of screen

fsdsfsdfsd

sanskrit-lexicon / PWK

PWKVN #86

preparation

digitization

Displays

Accents in the pwkvn display.

volume 7 hw lists and `<althws>`

volume 1-6 althws

suggestion

experimental versions

sanskrit-lexicon / PWK

PWKVN #86

preparation

digitization

Displays

Accents in the pwkvn display.

volume 7 hw lists and <althws>

volume 1-6 althws

suggestion

experimental versions

volume 7 hw lists and `<althws>`