sanskrit-lexicon / SKD

Discussion of corrections and other issues pertaining to Sabdakalpadruma dictionary at Sanskrit-Lexicon
0 stars 0 forks source link

vcp-skd comparison #9

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

A report has been developed to facilitate comparison for verbs between VCP and SKD dictionaries. This is in response to an interest expressed by @Shalu411 .

The reports are in the form of HTML documents, located here. To see the reports as rendered html, click one of the links in the readme. One report is in slp1 spelling, and one is in Devanagari spelling.

funderburkjim commented 4 years ago

Here is a screenshot of the devanagari display after it has finished loading: image

funderburkjim commented 4 years ago

And here is a screenshot with one of the 'accordion' panels open: image

gasyoun commented 4 years ago

https://github.com/sanskrit-lexicon/sanskrit-lexicon.github.io/blob/master/verbs/vcp_skd/verb2_deva.html

http://sanskrit-lexicon.github.io/verbs/vcp_skd/verb2_deva.html

Such a great work. Hope she likes it.

funderburkjim commented 4 years ago

report on unmatched verbs

There are about 170 entries of vcp recognized as verbs, and having not correspondent entry in skd.

And about 100 verbs of skd without correspondents in vcp.

Here are links to HTML reports:

funderburkjim commented 4 years ago

The links above are to github.io. If you click on one of those, you'll get an 'accordion' display, as shown above.

Shalu411 commented 4 years ago

Hariom. Mark and Jim! I do like every bit of it!! THANKS Am on flight to work! Will be interested in working on these two.

vcp-skd-verb2-deva Same, with Devanagari spelling of Sanskrit. vcp-skd-verb2-nomatch-deva Devanagari spelling.

They are HTMLs. Will be looking through them. Shall I copy paste whole thing or right click and open it will Notepad++ ? I think any of it will work. Guide me please

Shalu411 commented 4 years ago

I must say- it's excellent!! I really like it! I can see the whole content of the dictionary here on the html sheet itself!! For me it's magic.

https://sanskrit-lexicon.github.io/verbs/vcp_skd/verb2_nomatch_deva.html Ok- I will do one thing. I will start with this- Will cross check what form is a non-match in each dictionary. Will it be a good start?

Eg- The first case of cross checking- First non-match verb -अड्ड from SKD exists in VCP- Here it is--

अड्ड [Cologne record ID=808] [Printed book page 0095,b] अड्ड अभियोगे, समाधाने च भ्वादि० पर० सक० सेट् । अड्डति । आड्डीत् । दोधपधोऽयम् अड्डिडिषति आड्डिडत् । क्विप् अत् । डोपधस्य तु आडिड्डिषति आडिड्डत् अट्

So now, my question is- where to note them, if more come in ? Here like this? Or make any excel sheet or something?

Shalu411 commented 4 years ago

Shall I tell you something interesting? The dhatu अत्टङ् अतिक्रमे in SKD is nothing but अट्ट अतिक्रमे in VCP. So I can possibly find them out and give you the list. Then we can rearrange, re-match. How to proceed? What will be the best way to represent that for your convenience? Could you please tell me a representing pattern?

gasyoun commented 4 years ago

They are HTMLs. Will be looking through them.

Keep HTMLs open in Chrome for looking. You can make a copy of them and open in Notepad++, but the code is not easy to read in that mode I guess and there is no need actually to do so.

Will cross check what form is a non-match in each dictionary. Will it be a good start?

Yes, a good choise.

Here like this?

Is enough, no need for Excel.

अत्टङ् अतिक्रमे in SKD is nothing but अट्ट अतिक्रमे in VCP

Seems clear enough. A list or few smaller lists here as a comment would do.

Shalu411 commented 4 years ago

Hariom, Namaste Here are the first few instances of such occurrences. (excel sheet attached) Don't know why, same dhAtu is there in SKD and VCP - Eg.् अल but it's shown in no-match. Shall I continue to record each this way? Please guide. VCP-SKD-NoMatch-Verbs.xlsx -Thanks

gasyoun commented 4 years ago

@funderburkjim Usha said there are cases she can't understand, because they are identical as per eye.

funderburkjim commented 4 years ago

Hi, @Shalu411

Your example of 'अड्ड' is good and will be useful.

The https://sanskrit-lexicon.github.io/verbs/vcp_skd/verb2_nomatch_deva.html report was based on rules that a program used to decide what was a verb in skd and what was a verb in vcp.

You have found अड्ड also in VCP as a verb -- this is a useful piece of information.

Similarly, your finding The dhatu अत्टङ् अतिक्रमे in SKD is nothing but अट्ट अतिक्रमे in VCP is also useful.

It would be good to have a complete list of such missed correspondences.

How to present these correspondences? Maybe as a text file For instance, first line of file could be:

VCP अड्ड 808 = SKD अड्ड 582

Do you see where the 808 and 582 numbers come from?

So this would be one file: call the file 'VCP-SKD_unmatched_matched.txt'.

The objective would be to examine all the cases in the '... nomatch_deva.html' file and resolve them.

funderburkjim commented 4 years ago

The other example is interesting, but is different. Why different? Well, both skd and vcp have , in our digitization, dhatu headword spelled spelled अट्ट. But, also, SKD has an entry with headword अत्टङ (706). And it appears also to correspond to VCP अट्ट .

For now, such examples should be in a different file. Incidentally, you may want to add some 'comment' material in the text files. Do this in lines that start with a semicolon. For example:

; note SKD also has dhatu headword अट्ट  
VCP अट्ट  791 = SKD अत्टङ 706
funderburkjim commented 4 years ago

no need for excel

I prefer this. Excel files are hard for me to work with. Just a nice '.txt' file is easier to work with, when formatted regularly like suggested above,

gasyoun commented 4 years ago

The other example is interesting, but is different.

Did not suspect that, hope @Shalu411 understands.

Shalu411 commented 4 years ago

Hariom. This is totally in place. Now I read everything- Perfect! This is why I had suspended the work for a guidance of making the result useful for you. :) Am doing it now. Thanks Jim. So kind of you. I am seeing my great great dream coming into reality, very soon!. You are doing great help to the Sanskrit world. .especially to my Dhatu world. Wow.. Thanks

gasyoun commented 4 years ago

Am doing it now.

Hope it can be finalized in May.

Shalu411 commented 4 years ago

Namaste Here are four cases- (More will follow) VCP अड्ड 808 = SKD अड्ड 582 VCP अल 4589 = SKD अल 2303 (This is there in Dictionary with same headword) VCP अर्क 4333 = SKD अर्क 2149 (This is there in Dictionary with same headword) VCP अम्भ 3980 = SKD अभ 1638 (This is variety of the same verb root)

Shalu411 commented 4 years ago

This particular case is like the अत्टङ् which is equal to अट्ट- Both the varieties are given in SKD as separate dhatus, whereas they happen to be same dhatu's alternate form. And this is known by the dhAtvartha that is one and the same.

VCP उज्झ 8722 = SKD उद्झ 4713 SKD उज्झ 4431

In this case- dhAtvartha is tyAge. So I note it in the manner specified- Is it OK?

Shalu411 commented 4 years ago

Hariom Interesting case in printed book of VCP here!! Actually this is a dhatu - which is clear from rest of the inward detail. But it gives पुं (puM) that is Masculine!!! This is wrong because verb-root has no liNga.

ऊष [Cologne record ID=10344] [Printed book page 1393,b] ऊष पु० रूजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत्

VCPpg1393

Shalu411 commented 4 years ago

1) Shall I also mark the ones which are genuinely no-matches? Or marking the matches out of the no-match list will do? I mean- the ones to which the matching in VCP is not found - shall I note such SKD dhAtus? Or not needed?

2) Case 0015: skd=ऐ and Case 0016: skd=क-- This is a non-root pick in the automation. Shall I note such issues for removing them later?

If the method of editing this HTML is opening it in Notepad++ and doing the correction and saving it, then I am aware of it. Shall I try?

funderburkjim commented 4 years ago

Your notation

VCP अड्ड 808 = SKD अड्ड 582
VCP अल 4589 = SKD अल 2303 (This is there in Dictionary with same headword)
VCP अर्क 4333 = SKD अर्क 2149 (This is there in Dictionary with same headword)
VCP अम्भ 3980 = SKD अभ 1638 (This is variety of the same verb root)

is good. But a minor addition would be useful. In first three, the problem was that the VCP verb was missed. A helpful comment might be

VCP अड्ड 808 = SKD अड्ड 582  (VCP headword missed)
VCP अल 4589 = SKD अल 2303 (VCP headword missed)
VCP अर्क 4333 = SKD अर्क 2149 (VCP headword missed)
VCP अम्भ 3980 = SKD अभ 1638 (This is variety of the same verb root)

Now I can see that the fourth match (अम्भ vs. अभ ) was missed because VCP and SKD used a different spelling. While the first three were missed because the VCP root was missed.

To correct these problems, I have to change things in two different places:

  1. be sure the first 3 missed VCP roots are added to the list of VCP roots
  2. For the 4th, change the mapping function between VCP/SKD so amBa and aBa match.
funderburkjim commented 4 years ago

I mean- the ones to which the matching in VCP is not found - shall I note such SKD dhAtus? Or not needed?

Yes, I think you should make comments here when you feel confident that a verb in one dictionary is not mentioned as a verb in the other dictionary. It might be useful at some point to review those.

Shalu411 commented 4 years ago

VCP अड्ड 808 = SKD अड्ड 582 (VCP headword missed) VCP अल 4589 = SKD अल 2303 (VCP headword missed) VCP अर्क 4333 = SKD अर्क 2149 (VCP headword missed)

The issue here is that- it is present in VCP list in the Cologne page- (I mean it is found in digitized version) as well as the printed book, but it is not picked in the automated list. When I search on VCP cologne page, it is found instantly.

1. be sure the first 3 missed VCP roots are added to the list of VCP roots

Yes- But to be added to our automated No-Match list HTML generated file only for further use.

2. For the 4th, change the mapping function between VCP/SKD so amBa and aBa match.

Yes! Perfetct. :)

Shalu411 commented 4 years ago

I mean- the ones to which the matching in VCP is not found - shall I note such SKD dhAtus? Or not needed?

Yes, I think you should make comments here when you feel confident that a verb in one dictionary is not mentioned as a verb in the other dictionary. It might be useful at some point to review those.

Ah, Sure- These things matter in the beginning of the work. Later re-work can be avoided. Thanks. Will make a comment in he same file with "; note" as you advised. Ok?

funderburkjim commented 4 years ago

Case 0015: skd=ऐ and Case 0016: skd=क-- This is a non-root pick in the automation. Shall I note such issues for removing them later?

Yes, you should mention. As I need to modify the root selection program for SKD to exclude these. Suggest you use a slightly expanded format for these. Don't use the 'Case numbers', as these might change when the lists are regenerated. Rather, use the L=number: Example:

SKD ऐ 5583  (not a verb)
SKD क 5738 (not a verb)

Also,

VCP ऊष 10344 (missed verb)
Shalu411 commented 4 years ago

Case 0015: skd=ऐ and Case 0016: skd=क-- This is a non-root pick in the automation. Shall I note such issues for removing them later?

Yes, you should mention. As I need to modify the root selection program for SKD to exclude these. Suggest you use a slightly expanded format for these. Don't use the 'Case numbers', as these might change when the lists are regenerated. Rather, use the L=number: Example:

Perfect!! Done!! Ok.

funderburkjim commented 4 years ago

Another typo change in Uza

Current digitization:

old:
ऊष पु० रूजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत्
            ---
New: Short vowel 'u' in 'rujAyAm'
ऊष पु० रुजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत्

Reasons:

@Shalu411 Agree with this spelling change?

funderburkjim commented 4 years ago

vcp changes re Uza: ref https://github.com/sanskrit-lexicon/csl-orig/issues/272

funderburkjim commented 4 years ago

@Shalu411 Have revised vcp-skd-verb2-nomatch-deva Devanagari spelling (other reports mentioned above also revised. )

The revisions include:

funderburkjim commented 4 years ago

not yet handled:

The new reports do not yet handle

VCP उज्झ 8722 = SKD उद्झ 4713 SKD उज्झ 4431

This is different, because one VCP verb entry corresponds to two SKD entries. Not sure yet how to take this into account.

gasyoun commented 4 years ago

one VCP verb entry corresponds to two SKD entries. Not sure yet how to take this into account.

Time to introduce the references, as we spoke before? A root can be just a link in a different dictionary.

Shalu411 commented 4 years ago

Another typo change in Uza

New: Short vowel 'u' in 'rujAyAm' ऊष पु० रुजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत् @Shalu411 Agree with this spelling change?

Sure!! It's correct. No long U..

Shalu411 commented 4 years ago

@Shalu411 Have revised vcp-skd-verb2-nomatch-deva Devanagari spelling (other reports mentioned above also revised. )

Hariom. Jim, are you going to share this revised one? Am continuing with the old one however. Am notifying new issues in the text file.

Shalu411 commented 4 years ago

Namaste In the no_match list- I started looking up the missing verbs in VCP as matched against SKD. i.e I take a dhatu in SKD (right side list) and check if it's there in VCP. More often than not- every dhatu is present in VCP in same form. Very rarely its some issue with the form difference as given in VCP that makes it no_match as per SKD. So here is the list so far- VCP अल 4589 = SKD अल 2303 VCP अर्क 4333 = SKD अर्क 2149 VCP उक्ष 8598 = SKD उक्ष 4358 VCP ऊष 10344 = SKD ऊष 5333 VCP ऋत 10441 = SKD ऋत 5385 VCP कज 11359 = SKD कज 5865 VCP कुच 13812 = SKD कुच 7865

I have been looking for things seriously differing, but it is ending up in a non-specific one everytime- almost everytime. I do not know why the automated list does not find the identical ones also. Jim, could you please let me know why this issue arises? Could you give me a new list after one more try probably? If its too tough, no need to waste time. But if its not that time/energy-consuming, please make it again for me. To summarize-- though the identical verb exists in VCP same as that in SKD, still the no_match list shows not. Will explain more if not followed.

funderburkjim commented 4 years ago

Jim, could you please let me know why this issue arises?

I think that the underlying reason relates to the patterns used to identify verbs in VCP.

Will work on this problem in next day or two.

gasyoun commented 4 years ago

Will work on this problem in next day or two.

Thanks.

funderburkjim commented 4 years ago

@Shalu411 Have revised vcp-skd-verb2-nomatch-deva Devanagari spelling (other reports mentioned above also revised. ) The 'match' report is also revised: vcp-skd-verb2-deva.

There are now 71 verbs in SKD that are unmatched.

Adding the 62 cases (no VCP headword) and the 9 cases (VCP headword, but not a verb), we get the 71 unmatched SKD cases shown in [vcp_skd_verb2_nomatch]https://github.com/sanskrit-lexicon/skd/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html).

Could any of those 71 SKD verbs still be verbs in VCP?

Yes, it is possible. However, the VCP spelling would have to be different in some detail from the SKD spelling.

One example may be द्रै in SKD (currently not matched, among the 71). There is द्रा which is known as a verb in both VCP and SKD. I think that द्रै in SKD is the same as द्रा . @shalu11 will have to confirm. If confirmed this would be a case where one verb in VCP matches TWO verbs in SKD, similar to the VCP उज्झ 8722 = SKD उद्झ 4713 SKD उज्झ 4431 example above.

As mentioned above, the reports don't know how to handle such cases where one VCP spelling corresponds to two SKD spellings; that's why उद्झ, L=4713 still shows up as an unmatched SKD verb in the nomatch report.

gasyoun commented 4 years ago

reports don't know how to handle such cases where one VCP spelling corresponds to two SKD spellings

Should not we develop a solution for that by now? Otherwise it will come back again and again, same issue.

Shalu411 commented 4 years ago

Hariom Namaste Great work Jim! I tried to see the revised work- but could not find the file

Adding the 62 cases (no VCP headword) and the 9 cases (VCP headword, but not a verb), we get the 71 unmatched SKD cases shown in [vcp_skd_verb2_nomatch]https://github.com/sanskrit-lexicon/skd/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html).

It says 404 Error- Page not found. Could you resend it, please?

funderburkjim commented 4 years ago

additional SKD verbs found

The following 29 headwords of SKD were determined to be verbs: (L-number , SLP1 spelling):

   '1827','1828', # ama
   '1917', # amba
   '2024', # aya
   '2466', # avaDIra
   '2726','2727', # aSa
   '2986', # asu
   '5410', # fSa
   '5571', # elA
   '6169', # kadqa
   '6176', # kana
   '10116', # Kala
   '12656', # cihna
   '12691', # cukka
   '17624 ', # Danva
   '20297', # pamba
   '21784', # pucCa
   '21843', # putta
   '23309', # prA
   '24428', # byuza
   '27793', # mfga
   '36475', # SmIla
   '36553', # SrA
   '37022', # zAntva
   '37056', # zeva
   '37097', # zwepa
   '37099', # zwyE
   '40829', # sPaqa
   '40876', # sPurja
   '42194', # hvf

Methodology

There were previously 176 VCP headwords which had no matches as SKD verbs, using the general 'iti kavikalpadrumaH' rule for identifying SKD verbs.

A search was made for each of these 176 headword spellings in the SKD dictionary. For all but 31 of the 176, no SKD entry was found. For the 31 headwords for which an SKD entry WAS found, that SKD entry was further examined to determine if the entry pertains to a verb. The resulting list of such verbs is shown above.

There were 2 headwords (KelA and goDA) which were found in SKD, but which were judged to be non-verbs.

The various reports were regenerated using the additions to SKD verbs. Now there are 148 VCP verb headwords still unmatched to SKD verbs, down from 176.

gasyoun commented 4 years ago

general 'iti kavikalpadrumaH' rule for identifying SKD verbs.

Now I know how it was made.

There were 2 headwords (KelA and goDA) which were found in SKD, but which were judged to be non-verbs.

And still had iti kavikalpadrumaH?

Now there are 148 VCP verb headwords still unmatched to SKD verbs, down from 176.

And the list can be seen at?

Shalu411 commented 4 years ago

Hariom, Namaste

using the general 'iti kavikalpadrumaH' rule for identifying SKD verbs.

That's wonderful!

The various reports were regenerated using the additions to SKD verbs.

Would love to see them and start working. Can't wait more. Please complete whatever is needed, and share Jim.. So that we all see our dream realized. :)

funderburkjim commented 4 years ago

404 error

This is the corrected link: https://github.com/sanskrit-lexicon/SKD/blob/master/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html

click the link

image

This is like looking at an html file as text. I

To get an 'html' view of this file, step 1

First, click on the 'Raw' button. You will see: image

to get an 'html' view of this file, step 2

In the above, RIGHT-CLICK save-as in the above window. You will then see a dialog: image

to get an 'html' view of this file, step 3

Click the 'Save' button. This will download the file.

to get an 'html' view of this file, step 4

On your local computer, navigate to the downloaded file, then double click. This will open the downloaded file in your browser Now you should see: image

funderburkjim commented 4 years ago

Note on prior comment:

In the very third comment of this issue, I gave the link: vcp-skd-verb2-nomatch-deva , which gives directly an html view of the file: image

Note that the file is dated 'March 4, 2020', while the latest display (that you downloaded above) is dated 'May 19, 2020'.

The reason is that I have not been copying the latest versions of the reports to the 'https://sanskrit-lexicon.github.io/' repository.

I plan to give links to the latest versions of the reports in this https://github.com/sanskrit-lexicon/SKD/ repository, so you will need to go through the downloading process to view the files as html.

funderburkjim commented 4 years ago

And KelA and goDA still had iti kavikalpadrumaH?

Not quite. Let me try to say it differently. Please have open the skd verb list skd_verb_filter.txt.

For skd, the MAIN rule for verbs is iti kavikalpadrumaH. In the verb list, search for 'code=1,'. When done in the browser you get 2,149 matches (this includes a small number with code=10, code=11, code=13, code=14, code=15). And there are 2,330 entries in total (go to bottom of file). There are also a handful with code=2, code=3. All these records have a line in skd.txt that matches iti kavikalpadrumaH either exactly or with one of several minor variant.

Not every entry with 'iti kavi..' is a verb. For example, entries that have a have non-verb marker are excluded by the program that makes the verb list (specifically, klI, or puM, or strI, or tri, are excluded).

Now look at the 'code=4' lines (there are 33 of these). Entries in the verb list with this code have been added 'manually'. Some of these have no iti kavi.. at all, such as the two ama entries.

ama [p= 1-081]
ama k roge . ka āmayati vyādhirlokaṃ . itidurgādāsaḥ .. [ID=1827]

ama gatau . bhajane . śabde . amati . iti durgā-dāsaḥ .. [ID=1828]

Note that ama is in the 'additional SKD verbs found list in the comment above.

But there are some of those additional SKD verbs that actually do have some form of iti kavi.., but not a form recognized by the program. For example:

;; Case 1986: L=36475, k1=SmIla, k2=SmIla, code=4
504231: SmIla¦, nimizaRe . itikavikalpadrumaH .. (BvA0 para0

Note that there is not a space in itikav... , and the matching variants didn't include this possibility.

Now as to the question regarding godA and KelA. Here we must also look at the VCP verb list. Note that godA and KelA are both said to be VCP verbs:

;; Case 0519: L=18064, k1=goDA, k2=goDA, code=1
243907: goDA¦ kOwilye kaRqvAderAkftigaRaH nAmaDAtuH para0 aka0
;; Case 0318: L=14982, k1=kelA, k2=kelA, code=1
201909: kelA¦ vilAse kaRqvA0 A0 a0 sew . kelAyate akelA-

Being VCP verbs, goDA and KelA were considered as additional candidates for the SKD verb list: See https://www.sanskrit-lexicon.uni-koeln.de/simple/skd/goDA

and https://www.sanskrit-lexicon.uni-koeln.de/simple/skd/KelA

You'll see that both are 'strI' (feminine nominals) -- so that's why they are not in the SKD verb list.

Hope this helps clear up some details about how the skd verb list was chosen.

P.S. -- @gasyoun -- play around with the 'simple' link form above. Isn't this close to what you were after as a shareable link?

gasyoun commented 4 years ago

The reason is that I have not been copying the latest versions of the reports to the 'https://sanskrit-lexicon.github.io/' repository.

That would make more sense, no? But the downloading process can go with Usha as well, guess.

entries that have a have non-verb marker are excluded by the program

What's the name of the Python script or part of the code? Want to know how things work in case we need to redupicate the process, but you say you are fed up (hope not soon) :)

Note that there is not a space in itikav... , and the matching variants didn't include this possibility.

Typo error to be fixed?

P.S. -- @gasyoun -- play around with the 'simple' link form above. Isn't this close to what you were after as a shareable link?

I'm speechless. Thanks!!!

Have said in the past, let me repeat:

1) https://www.sanskrit-lexicon.uni-koeln.de/simple/skd/KelA that is good. But we can still improve it.

2) https://www.sanskrit-lexicon.uni-koeln.de/simple/SKD/KelA is better, because there is no abbreviation skd, but there is SKD on the homepage and so should it be CAPITAL in:

3) As we intend to use rewrite magic, there is no need to have /simple/ and www in the URL and https://sanskrit-lexicon.uni-koeln.de/SKD/KelA is just enough and as short as can be.

Hope I make myself clear enough. If done so - not questions asked and what has been started 6 years ago will be put to an end, thanks!

Shalu411 commented 4 years ago

This is the corrected link: https://github.com/sanskrit-lexicon/SKD/blob/master/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html

Namaste I have received the file for work now. Thanks! Thanks for all the instructions. :) Feels great.