Closed funderburkjim closed 10 months ago
Good news to share--
Got pwkvn, SKD and VCP data almost fully.
Main goal was to resolve differences in italic text fragments and devanagari text fragments. A few other miscellaneous differences resolved. See:
Next step will be to harvest the <hom>
tag markup which AB developed.
@Andhrabharati Good to know you've got much of the lost work back!
Got pwkvn, SKD and VCP data almost fully.
@Andhrabharati can you make cloud backup? Send the files in Telegram to me?
One good way to go about it can be google drive or private github repository.
I use private github repository for works which are not completed. Gives better version control and also safeguards against the data loss.
I use private github repository for works which are not completed. Gives better version control and also safeguards against the data loss.
@drdhaval2785 I believe not all of them are to be uploaded, like rare PDF scans @Andhrabharati has gathered. Anyway it's of utmost importance and value for me as well. @Andhrabharati has proved himself dedicated to Sanskrit lexicography. He is one of the best out there.
Add hom markup to cdsl pw.txt, based on Andhrabharati markup.
Work done in hom directory.
See:
Revisions to csl-websanlexicon and csl-apidev for display of the hom markup
9 matches for "</hom> <is>" in buffer: temp_pw_11_work.txt
8063 matches in 7768 lines for "</hom> {#" in buffer: temp_pw_11_work.txt
Should these 9 be considered print errors in PW?
Resolve differences in markup involving the <ls>
tag.
9 matches for "
" in buffer: temp_pw_11_work.txt ... Should these 9 be considered print errors in PW?
No harm in assuming so, and changing those to {#...#}; especially as they are followed by the meaning 'number' of the particular entry words {#...#}.
In fact, my understanding is that majority of the <is>
elements denote the entry words in the dictionary [while mostly being 'proper nouns'].
But we need to guess why they were rendered in Roman script instead of Devanagari!!
<hom>3.</hom> <is>Kāvya</is>
4〉a〉
<hom>1.</hom> <is>Nārāyaṇa</is>
1〉
<hom>2.</hom> <is>Karaṇa</is>
4〉n〉
<hom>2.</hom> <is>Karaṇa</is
> 4〉n〉
<hom>1.</hom> <is>Nidhana</is>
5〉
<hom>2.</hom> <is>Vrātya</is>
1〉
<hom>1.</hom> <is>Dvaipāyana</is>
has only one (and thus, unnumbered) meaning.
<hom>1.</hom> <is>Mahābhairava</is>
has only one (and thus, unnumbered) meaning.
However,
<hom>2.</hom> <is>Vrātya</is>
is a case of exception (L-109138)!!
Probably, it could also be made as having 1〉, as at L-109137 and L-109139.
[The above material is as in my version.]
Just seen that a total of ~190 <is>
entities have a meaning 'number' followed in pwk.
See ls/readme.txt.
</ls>
;The next step is to resolve differences in parenthetical text groups, about 1000 cases. I think (hope?) the end is near in this worthwhile but lengthy resolution process.
I think (hope?) the end is near in this worthwhile but lengthy resolution process.
It sure is, as it's the top-3 dictionary we have.
It sure is, as it's the top-3 dictionary we have.
What are the other two, in your opinion? PWG and MW?
For most Indian users, I guess, AP90/57, VCP, MW, and SKD are the topmost Sanskrit lexical references.
I just wish @funderburkjim changes his mind from keeping the pwkvn data separate from the pwk main data (thus 'hidden' for most of the users); it is presently lying in some very deep link, to be known to the prospective users, instead of in a proper 'publicly' known place/interface (or did I miss this, if done so already?).
And I look forward to see him putting the VN data atleast beneath the resp. pwk entries as done in GRA recently [VN data at the bottom of the main entry, in addition to the corrections getting 'integrated' inside the entry itself], if not 'integrated' inside the main matter; the 'new' entries anyway have to be separately shown.
See paren/readme.txt.
Resolve differences between AB version and cdsl version in
text in parentheses
lex tag
div tag
miscellany
1712 lines of pw.txt changed.
167 lines of pw_ab changed
Refer change_pw_13.txt and change_pw_ab_13.txt.
AB version has added markup to verbs in pw (this markup not present in pw print)
Should the cdsl version include this? I think yes. Should also the '!' be in cdsl version? Not sure.
167 lines of pw_ab changed
There are 176 changed lines, not 167!
* many of these made at in-line page breaks, to facilitate comparisons between ab-version and cdsl version.
About 50% of the changes are to shift the page-breaks outside (either to left-side or right-side) of the parentheses; though this is not a big point to debate upon (I had already copied Jim's corrections into my latest file), I see no "important" reason to do this!! [The reason for such corrections, that I had implemented, within <ls>
entities is to make those strings continuous, thus facilitating easier identification of the entities.]
However, some of the others at "Mit [Pagexxx] {#prefix#}" are nice changes, bringing in uniformity among all the Mit {#...#} strings.
; <L>46218<pc>3034-2<k1>tuwi<k2>tuwi<e>000
226361 old {#tuwi#}¦(*<lex>m.</lex> <lex>f.</lex>) {%kleine Kardamomen%} <ls>UTPALA</ls> zu <ls>VARĀH. BṚH. S. 78,1</ls>.
;
226361 new {#tuwi#}¦ (*<lex>m.</lex> <lex>f.</lex>) {%kleine Kardamomen%} <ls>UTPALA</ls> zu <ls>VARĀH. BṚH. S. 78,1</ls>.
;---------------------------------------------------
cdsl file has 4 more places to do similar correction (i.e., to add a space after ¦)
24084:
{#anta/rikza#}¦)
-> {#anta/rikza#}¦
240313:
*{#daRqagrAha#}¦<lex>m.</lex>
-> *{#daRqagrAha#}¦ <lex>m.</lex>
247524:
{#dASASvameDa#}¦<lex>m.</lex>
-> {#dASASvameDa#}¦ <lex>m.</lex>
258481:
<hom>2.</hom> {#devakzatra#}¦<lex>m.</lex>
-> <hom>2.</hom> {#devakzatra#}¦ <lex>m.</lex>
; <L>76880<pc>4223-2<k1>bAhAdura<k2>bAhAdura<e>100
377536 {#bAhAdura#}¦ <lex>m.</lex> als <ab>Beiw.</ab> von neuerer Zeit so <ab>v. a.</ab> {%Held%} (<lang n="arabic">بهاتور</lang>, <lang n="UNK">???</lang>, <lang n="russian">богатырь</lang>).
377536 {#bAhAdura#}¦ <lex>m.</lex> als <ab>Beiw.</ab> von neuerer Zeit so <ab>v. a.</ab> {%Held%} (<lang n="arabic">بهاتور</lang>, <lang n="russian">богатырь</lang>).
;---------------------------------------------------
;;AB remark
<lang n="UNK">???</lang>,
could be changed with the Mongolian string <lang n="mongolian">ᠪᠠᠭᠠᠲᠦᠷ???</lang>,
thus filling the missing text in digitisation.
Finally, there are still 5 nos. of ".)," and one ".)." in the cdsl file.
@Andhrabharati Please provide response regarding !√, so I will know the meaning and how to handle in cdsl.
Would you mind deciphering the info from the below?--
MW:
<L>1378.1<pc>1309,1<k1>aghaya<k2>aghaya<e>2
✮<s>aghaya</s>, ∆ <ab>Nom.</ab> <ab>P.</ab> <s>agha°yati</s>, ◊¦ to do evil, sin, <ls>Dhātup.</ls>
<LEND>
pwk:
<L>843<pc>1-010-b<k1>aGAy<k2>aGAy<e>500
!√{#aGAy#}¦ {#°ya/ti#} {%Schaden zufügen wollen%}.
<div n="p">— Mit {#aBi#} das.
<LEND>
[They are what are termed 'Nominal verbs' in various works.]
BTW, the das.
also needs to be marked as an abbr. like dass.
I seem to have missed this earlier!! [Just seen that my PWG work had this duly marked as an abbr.]
Realised that I gave a wrong word from MW, to compare with pwk entry.
Here is the actual intended one--
There are 176 changed lines, not 167!
According to change_pw_ab_13.txt, there are 167 'change transactions' (' [0-9] old ' occurs 167 times).
@Andhrabharati Where do you get 176?
I did not count them as such, but just added the following from your file:
; Part 1: 108 resolve (X) differences between cdsl and ab versions
; Part 2: 3 resolve diffs in <lex>X</lex>
; Part 3: 0 [.]</X>[.] -> .</X>
; Part 4: 46 Misc. changes (4a,4b,4c). See readme
; Part 5: 19 resolve diffs in <div>
tag
there are still 5 nos. of ".)," and one ".)." in the cdsl file.
These 6 are consistent with print. That's why.
Did you forget that we've been changing at quite some places, wrt print matter?
And this PWG entry is the "base" for the above MW and pwk entries--
<L>606<pc>1-0045<k1>aGAy<k2>aGAy
{#aGAy#}¦ (<ab>denom.</ab> von {#aGa#}), <ls n="Padapāṭha">padap.</ls> {#aGay#} {%Schaden zufügen wollen, bedrohen%}: {#ja\hi yo no^ aGA\yati^#} <ls>ṚV. 1,131,7.</ls> <ab>Partic.</ab> <ab>act.</ab> {#aGAya/nt#} <ls n="ṚV.">1,91,8. 5,24,3.</ls> <ab>u. s. w.</ab> <ls>AV. 10,9,1. 4,10.</ls>
<div n="v">— <ab>V. l.</ab> {#aGay#}.
<div n="p">— {#aBi#} <ab>dass.</ab> {#yo na\H kaScA^ByaGA\yati^#} <ls>AV. 7,71,3.</ls>
<LEND>
Did you forget that we've been changing at quite some places, wrt print matter?
No, nor did I forget that we generally are faithful to the print.
From your comment, I infer that in these 6, you believe we should make this (minor) change to the print. I'll do that in the next version.
added the following from your file:
Part 4: of change_pw_ab_13.txt should count as '37' -- the '46' is my error.
Now corrected. Now the sum is 167.
@Andhrabharati Does 'das.' mean the same as 'dass.',
namely dasselbe - the same
?
Yes.
Add !√ and √ markup to cdsl based on AB version. Make additional corrections based on suggestions by @Andhrabharati since yesterday.
Refer:
@funderburkjim I only now see the
but where are the dhatus to be found? Only the scripts I see in the folder.
no, it doesn't. without looking at the context, i think expanded it is 'daselbst' meaning 'at that same place'.
On Wed, Nov 8, 2023, 02:53 funderburkjim @.***> wrote:
@Andhrabharati https://github.com/Andhrabharati Does 'das.' mean the same as 'dass.', namely dasselbe - the same ?
— Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/PWK/issues/88#issuecomment-1799796666, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY4EMI2PNQHVTI4DIQQ2ZTYDKGVBAVCNFSM5YEL6E6KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZHE3TSNRWGY3A . You are receiving this because you were mentioned.Message ID: @.***>
Great to see you back again, @maltenth !
Would you pl. go through all the global abbr. expansions also once, just like the local abbr. that you had recently looked at? [I expected you or @fxru would be doing the job; but somehow I felt it was @funderburkjim himself that had done the work.]
@maltenth -- This extract of
19 matches for "<ab>[dD]as\.</ab>" in buffer: temp_pw_14.txt
1981:{#agaRya#}¦ <lex>Adj.</lex> <ab>Das.</ab> <ls>Spr. 7688</ls>. <ls n="Spr.">7745</ls>.
3862:{#aGavinASin#}¦ <lex>Adj.</lex> <ab>das.</ab> <ls>Spr. 7853</ls>.
3903:<div n="p">— Mit {#aBi#} <ab>das.</ab>
8899:{#atimanohara#}¦ <lex>Adj.</lex> <ab>das.</ab> <ls>R. 1,9,55</ls>.
115066:<div n="1">— 1) {%Affe%}. <ab>Das.</ab> *<lex>f.</lex> {#kapi#} und {#kapI#}.
201789:<hom>2.</hom> *{#CA#}¦ <lex>m.</lex> (<ab>Nom.</ab> {#CAs#}) {%ein Junges%}. <ab>Das.</ab> <lex>f.</lex> <ab>s. u.</ab> {#Ca#}.
257535:{#dfQaBaktika#} und {#°Baktimant#} (<ls>R. GORR. 2,11,28</ls>)¦ <lex>Adj.</lex> <ab>das.</ab>
309847:<div n="1">— 7) <ab>das.</ab> Auge, den Blick {%werfen —, richten auf%} (<ab>Loc.</ab>).
338900:<div n="2">— a) (nur die Formen {#puru/, pu/rU/, pu/rU/Ri, pu/rU/RAm#} und <ab>das.</ab> <lex>f.</lex> {#pUrvI/#} in verschiedenen casus) {%viel, reichlich%}. In der späterren Sprache nur am Anfange einiger Composita. <lex>Adv.</lex> {#puru/#} und {#pu/rU/#} {%viel, oft, sehr%} (auch mit einem <ab>Compar.</ab> und <ab>Superl.</ab>). <div n="p">— Mit {#si/mA#} {%allenthalben;%} mit {#uru/#} {%sehr weit, weit und breit;%} mit {#tira/s#} {%weithin, weither;%} mit {#vi/Sva#} {%durchaus jeder, aller und jeder;%} vor Zusammensetzungen mit {#puru#} noch weiter steigernd.
340496:<div n="1">— 5) bisweilen verwechselt mit {#puzpy#}; <ab>s.</ab> <ab>das.</ab>
359891:{#prapARika#}¦ <lex>m.</lex> <ab>das.</ab> <ls>CARAKA. 5,8</ls>.
395541:<div n="1">— 1) <ab>Caus.</ab> von <hom>1.</hom> {#BU#}; <ab>s.</ab> <ab>das.</ab>
445101:{%Gedanken „—“%} übersetzen (<ls n="Chr.">304,1</ls>. <ls n="Chr. 304,">12</ls>. <ls n="Chr.">323,19</ls>). <ab>Das.</ab> <ab>Relat.</ab> verbindet sich gern mit andern <ab>Pronomm.</ab>: {#ya/stva/m#}, {#yo/ 'ya/m#}, {#yaH saH#}, {#ya eza/H#}, {#yo'sO#}, {#sa/ ya/H#}, {#asO yaH#}.
475031:{#romodgati#}¦ <lex>f.</lex> <ab>das.</ab>
476232:<div n="1">— 1) <ab>das.</ab>
559994:{#SAtakumBIya#}¦ <lex>Adj.</lex> <ab>das.</ab> <ls>Ind. St. 15,434</ls>. <ls>DAMAYANTĪK. 2,24</ls>.
562777:{#SAsanadevI#}¦ <lex>f.</lex> <ab>das.</ab> <ls>HEM. PAR. 9,93</ls>.
663198:<div n="p">— Mit {#aBi#}, {#°zvajate#} <ab>das.</ab> <ls>ŚIŚ. 13,10</ls>.
681073:<div n="2">— a) <ab>das.</ab>
Where are the dhatus found?
@gasyoun This extract from pw.txt may be useful. temp__roots_pw_14.txt
@funderburkjim
;
<L>81343<pc>4290-3<k1>BlAs<k2>*BlAs<e>500
400097 old *{#BlAs#}¦ <ab>v. l.</ab> für {#BlAS#}.
400097 new *{#BlAs#}¦ <ab>v. l.</ab> für √{#BlAS#}.
;;AB note-- this should have the entry word also "marked"
400097 new *√{#BlAs#}¦ <ab>v. l.</ab> für √{#BlAS#}.
;
<L>89314<pc>5113-3<k1>ya<k2>ya/<h>1<e>500
439876 old <div n="1">— 1〉 {%wer, welcher%}. ~~~ Das. <ab>Relat.</ab> verbindet sich gern mit andern <ab>Pronomm.</ab>: {#ya/stva/m#}, {#yo/ 'ya/m#}, {#yaH saH#}, {#ya eza/H#}, {#yo'sO#}, {#sa/ ya/H#}, {#asO yaH#}.
439876 new <div n="1">— 1〉 {%wer, welcher%}. ~~~ <ab>Das.</ab> <ab>Relat.</ab> verbindet sich gern mit andern <ab>Pronomm.</ab>: {#ya/stva/m#}, {#yo/ 'ya/m#}, {#yaH saH#}, {#ya eza/H#}, {#yo'sO#}, {#sa/ ya/H#}, {#asO yaH#}.
;;AB note-- there is no dot after Das in the entry, as such it is not an abbr.
439876 new <div n="1">— 1〉 {%wer, welcher%}. ~~~ Das <ab>Relat.</ab> verbindet sich gern mit andern <ab>Pronomm.</ab>: {#ya/stva/m#}, {#yo/ 'ya/m#}, {#yaH saH#}, {#ya eza/H#}, {#yo'sO#}, {#sa/ ya/H#}, {#asO yaH#}.
Refer rab directory
modify cdsl version
<div n="1">— 2〉
{%mit einem%} {#pratihAra#} 1〉c〉
Also modify cdsl and AB versions for the two corrections mentioned in @Andhrabharati previous two comments.
Due to the large number of changes, no change_pw_15.txt file prepared.
The few (12) corresponding changes to pw_ab version are noted in change_pw_ab_15.txt.
Revised pwab_input.txt per @maltenth suggestion for 'das.' global abbreviation.
There is no 'next step' that I see at this time. The cdsl version 15 now takes into account all the features of @Andhrabharati version that I have noticed. The cdsl version 15 is in csl-orig repository at commit 5fd4222ad497c894ed5d602fa652e4a2ec68c3cd.
temp_pw_ab_15.zip is the corresponding version of AB work.
@Andhrabharati Have I missed anything that needs to be done to cdsl version based on your version?
It remains to incorporate further revisions to global and local (unt, under) abbreviation expansions from @maltenth when they are available. We will also need to do another round of 'ls' tooltip review. But that can be worked on in another issue.
- to use right-angle-bracket for section reference
This is for meaning numbers which I have introduced from GRA, for the open ended parenthesis (as an aid to match the right and left brace counts).
Good to note that you also liked the idea, @funderburkjim
There is one correction that you need to revert back,
; <L>63141<pc>4027-3<k1>par<k2>par<h>1<e>500
310304 old <div n="1">— 3〉 {#pUrta#}
310304 new <div n="1">— 3〉 {#pUrta#} 1〉
;;AB note the 1〉 is deliberately changed as a〉 here; it is similarly used in the next meaning 4〉in line-310308. This may be taken as a print change.
Let me go through the cdsl file tomorrow, to see if any changes wrt my V.1 file are missing still. [I was just about to retire for the day, before your post came up in my mail notification.]
Probably, I should give my V.2 file before the ls corrections are taken up.
@funderburkjim
As you have been mentioning all through, having the same line-counts is the basic & primary requirement for comparing two files.
Hence I started with making the cdsl file and AB file to have same number of lines, and the steps are detailed in this file-- matching line counts in pw with pw_ab_15.txt
The resulting files are in this zip-- 15d.zip [I thought pushing the intermediate files is not so necessary, so skipped adding them here.]
[Probably, you might also think of removing the 'extra' blank lines (and splitting the " <div" strings to start a new line) in the cdsl file, now that the corrections are almost over.]
Now, it will be very easy for you to identify [using your script-- (updateByLine.py?)] that about 1500 differences are there in the two files-- which encompass (missing/extra/wrong) punctuation marks, (missing/extra) spaces/tabs, textual corrections etc. Many of these corrections are to be done in the cdsl file, and some are to be done in the AB file yet.
However, I have tried to make a diff file at my end (for the first time!!)-- pw_ab_15 differences.txt
Also, it is interesting that the ¦-count is lesser by 15 (135756), as compared with the L-count (135771), in the cdsl file.
[Jim]
@Andhrabharati Does 'das.' mean the same as 'dass.', namely
dasselbe - the same
?
[Thomas}
no, it doesn't. without looking at the context, i think expanded it is 'daselbst' meaning 'at that same place'.
[Jim]
Revised pwab_input.txt per @maltenth suggestion for 'das.' global abbreviation.
@maltenth
I think except at two places, where the "das. = at the same place" could be taken,
340496:<div n="1">— 5) bisweilen verwechselt mit {#puzpy#}; <ab>s.</ab> <ab>das.</ab>
395541:<div n="1">— 1) <ab>Caus.</ab> von <hom>1.</hom> {#BU#}; <ab>s.</ab> <ab>das.</ab>
the "das. = the same (as above)" appears to be more appropriate at all other 16 places.
I would be happy being declared as wrong, but you're requested to go through the above lines once, and confirm the abbreviations.
Sorry that I do not want to leave anything that I get involved into, without properly convincing myself. Pl. spare a few minutes of your time on this, again.
I have gone through the 19 instances @funderburkjim provided and have compared them to the original b/w scan (I have also used the Robart pw colour scan on archive.org, which is better than the cologne scan), and you are mostly quite right.
Most of the "das." were just plain typos or due to the bad print of the edition used for keyboarding.
Here are my corrections:
[w.i.c.] stands for [which is correct]
after a forward slash you find some more, unrelated, corrections of the text.
•1981 .{#agaNeya#}¦ •Adj. original has "dass." [w.i.c] .{#agaNya#}¦ Adj. Das.
•3862 original has "dass." [w.i.c] .{#aghavinAzin#}¦ Adj. das.
•3903 original has "dass." [w.i.c] / below corr. Harre-Haare <+> {#abhi#} das.
•8899 .{#atimanorama#}¦ •Adj. (•f. {#A#}) original has "dass." [w.i.c] .{#atimanohara#}¦ Adj. das.
•115066 .{#kapiª#}¦ •m. original has "Das" [w.i.c] .{%Affe.%} Das. *f. {#kapi#} und {#kapI#}.
•201789 original has "Das" [w.i.c] .{%ein…Junges.%} Das. f. s.u. {#cha#}.
•257535 .{#dRDhabhaktika#}¦ und {#°bhaktimant#} (R. original has "dass."[w.i.c] .¶GORR.‡2¨11¨28) Adj. das. .{%niederwerfen…,…niederhauen%} , [Page4.013-3]
•309847 original has "das" [w.i.c] .²7) das. Auge , den Blick
•338900 original has "das" [w.i.c] / correct späterren->späteren .³a) (nur die Formen {#puruª…,…puªrUª…,…puªrUªNi…,…puªrUªNAm#} und das. f. {#pUrvIª#} in verschiedenen casus)
•340496 daselbst!? .²5) bisweilen verwechselt mit {#puSpy#} ; s. das.
•359891 original has "dass." [w.i.c] .{#prapANika#}¦ m. das.
•395541 daselbst!? / to correct von.-von .²1) •Caus. von 1. {#bhU#} ; s. das.
•445101 original has "Das" [w.i.c] / correct above sind;-sind: {%beim…[Page5.114-1]…Gedanken%} ``-x-´´ übersetzen (304¨1.12.‡323¨19). Das. Relat. verbindet sich gern mit andern Pronomm.: {#yaªstvaªm#} , {#yoª#} {#jyaªm#} , {#yaH#} {#saH#} , {#ay#} {#eSaª#} ; , {#yojsaª#} , {#saª#} {#yaªH#}. {#asaª#} {#yaH#}.
•475031 original has "dass." [w.i.c] .{#romodgati#}¦ f. das.
•476232 original has "dass." [w.i.c] ²1) das.
•559994 .{#zAtakumbhamaya#}¦ •Adj. (•f. {#I#} original has "dass." [w.i.c] .{#zAtakumbhIya#}¦ Adj. das.
•562777 .{#zAsanadevatA#}¦ •f. bei den †G4aina original has "dass." [w.i.c] .{#zAsanadevI#}¦ f. das.
•663198 original has "dass." [w.i.c] {#abhi…,…°Svajate#} das. .{%gelber…Jasmin.%} [Page7.284-3]
•681073 original has "dass." [w.i.c] .³a) das.
Thanks a lot, @maltenth !!
I have also noticed that the pwk typed text has quite many errors, as compared to the print [of course, the print is also having quite some errors!], and was thinking of doing a "complete" proof of the text once for all.
I think this pwk is one of the best (single-handed) compilations in Sanskrit lexicography, and probably the widely used one too (apart from MW).
Do you think the pwk proofing would be a worthy spend of the time?
There are quite a few unmarked abbreviations in pw.txt. Derive a procedure for identifying and marking many of these.