Open funderburkjim opened 4 years ago
Here is a screenshot of the devanagari display after it has finished loading:
And here is a screenshot with one of the 'accordion' panels open:
There are about 170 entries of vcp recognized as verbs, and having not correspondent entry in skd.
And about 100 verbs of skd without correspondents in vcp.
Here are links to HTML reports:
The links above are to github.io. If you click on one of those, you'll get an 'accordion' display, as shown above.
Hariom. Mark and Jim! I do like every bit of it!! THANKS Am on flight to work! Will be interested in working on these two.
vcp-skd-verb2-deva Same, with Devanagari spelling of Sanskrit. vcp-skd-verb2-nomatch-deva Devanagari spelling.
They are HTMLs. Will be looking through them. Shall I copy paste whole thing or right click and open it will Notepad++ ? I think any of it will work. Guide me please
I must say- it's excellent!! I really like it! I can see the whole content of the dictionary here on the html sheet itself!! For me it's magic.
https://sanskrit-lexicon.github.io/verbs/vcp_skd/verb2_nomatch_deva.html Ok- I will do one thing. I will start with this- Will cross check what form is a non-match in each dictionary. Will it be a good start?
Eg- The first case of cross checking- First non-match verb -अड्ड from SKD exists in VCP- Here it is--
अड्ड [Cologne record ID=808] [Printed book page 0095,b] अड्ड अभियोगे, समाधाने च भ्वादि० पर० सक० सेट् । अड्डति । आड्डीत् । दोधपधोऽयम् अड्डिडिषति आड्डिडत् । क्विप् अत् । डोपधस्य तु आडिड्डिषति आडिड्डत् अट्
So now, my question is- where to note them, if more come in ? Here like this? Or make any excel sheet or something?
Shall I tell you something interesting? The dhatu अत्टङ् अतिक्रमे in SKD is nothing but अट्ट अतिक्रमे in VCP. So I can possibly find them out and give you the list. Then we can rearrange, re-match. How to proceed? What will be the best way to represent that for your convenience? Could you please tell me a representing pattern?
They are HTMLs. Will be looking through them.
Keep HTMLs open in Chrome for looking. You can make a copy of them and open in Notepad++, but the code is not easy to read in that mode I guess and there is no need actually to do so.
Will cross check what form is a non-match in each dictionary. Will it be a good start?
Yes, a good choise.
Here like this?
Is enough, no need for Excel.
अत्टङ् अतिक्रमे in SKD is nothing but अट्ट अतिक्रमे in VCP
Seems clear enough. A list or few smaller lists here as a comment would do.
Hariom, Namaste Here are the first few instances of such occurrences. (excel sheet attached) Don't know why, same dhAtu is there in SKD and VCP - Eg.् अल but it's shown in no-match. Shall I continue to record each this way? Please guide. VCP-SKD-NoMatch-Verbs.xlsx -Thanks
@funderburkjim Usha said there are cases she can't understand, because they are identical as per eye.
Hi, @Shalu411
Your example of 'अड्ड' is good and will be useful.
The https://sanskrit-lexicon.github.io/verbs/vcp_skd/verb2_nomatch_deva.html report was based on rules that a program used to decide what was a verb in skd and what was a verb in vcp.
You have found अड्ड also in VCP as a verb -- this is a useful piece of information.
Similarly, your finding
The dhatu अत्टङ् अतिक्रमे in SKD is nothing but अट्ट अतिक्रमे in VCP
is also useful.
It would be good to have a complete list of such missed correspondences.
How to present these correspondences? Maybe as a text file For instance, first line of file could be:
VCP अड्ड 808 = SKD अड्ड 582
Do you see where the 808 and 582 numbers come from?
So this would be one file: call the file 'VCP-SKD_unmatched_matched.txt'.
The objective would be to examine all the cases in the '... nomatch_deva.html' file and resolve them.
The other example is interesting, but is different. Why different? Well, both skd and vcp have , in our digitization, dhatu headword spelled spelled अट्ट. But, also, SKD has an entry with headword अत्टङ (706). And it appears also to correspond to VCP अट्ट .
For now, such examples should be in a different file. Incidentally, you may want to add some 'comment' material in the text files. Do this in lines that start with a semicolon. For example:
; note SKD also has dhatu headword अट्ट
VCP अट्ट 791 = SKD अत्टङ 706
no need for excel
I prefer this. Excel files are hard for me to work with. Just a nice '.txt' file is easier to work with, when formatted regularly like suggested above,
The other example is interesting, but is different.
Did not suspect that, hope @Shalu411 understands.
Hariom. This is totally in place. Now I read everything- Perfect! This is why I had suspended the work for a guidance of making the result useful for you. :) Am doing it now. Thanks Jim. So kind of you. I am seeing my great great dream coming into reality, very soon!. You are doing great help to the Sanskrit world. .especially to my Dhatu world. Wow.. Thanks
Am doing it now.
Hope it can be finalized in May.
Namaste Here are four cases- (More will follow) VCP अड्ड 808 = SKD अड्ड 582 VCP अल 4589 = SKD अल 2303 (This is there in Dictionary with same headword) VCP अर्क 4333 = SKD अर्क 2149 (This is there in Dictionary with same headword) VCP अम्भ 3980 = SKD अभ 1638 (This is variety of the same verb root)
This particular case is like the अत्टङ् which is equal to अट्ट- Both the varieties are given in SKD as separate dhatus, whereas they happen to be same dhatu's alternate form. And this is known by the dhAtvartha that is one and the same.
VCP उज्झ 8722 = SKD उद्झ 4713 SKD उज्झ 4431
In this case- dhAtvartha is tyAge. So I note it in the manner specified- Is it OK?
Hariom Interesting case in printed book of VCP here!! Actually this is a dhatu - which is clear from rest of the inward detail. But it gives पुं (puM) that is Masculine!!! This is wrong because verb-root has no liNga.
ऊष [Cologne record ID=10344] [Printed book page 1393,b] ऊष पु० रूजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत्
1) Shall I also mark the ones which are genuinely no-matches? Or marking the matches out of the no-match list will do? I mean- the ones to which the matching in VCP is not found - shall I note such SKD dhAtus? Or not needed?
2) Case 0015: skd=ऐ and Case 0016: skd=क-- This is a non-root pick in the automation. Shall I note such issues for removing them later?
If the method of editing this HTML is opening it in Notepad++ and doing the correction and saving it, then I am aware of it. Shall I try?
Your notation
VCP अड्ड 808 = SKD अड्ड 582
VCP अल 4589 = SKD अल 2303 (This is there in Dictionary with same headword)
VCP अर्क 4333 = SKD अर्क 2149 (This is there in Dictionary with same headword)
VCP अम्भ 3980 = SKD अभ 1638 (This is variety of the same verb root)
is good. But a minor addition would be useful. In first three, the problem was that the VCP verb was missed. A helpful comment might be
VCP अड्ड 808 = SKD अड्ड 582 (VCP headword missed)
VCP अल 4589 = SKD अल 2303 (VCP headword missed)
VCP अर्क 4333 = SKD अर्क 2149 (VCP headword missed)
VCP अम्भ 3980 = SKD अभ 1638 (This is variety of the same verb root)
Now I can see that the fourth match (अम्भ vs. अभ ) was missed because VCP and SKD used a different spelling. While the first three were missed because the VCP root was missed.
To correct these problems, I have to change things in two different places:
I mean- the ones to which the matching in VCP is not found - shall I note such SKD dhAtus? Or not needed?
Yes, I think you should make comments here when you feel confident that a verb in one dictionary is not mentioned as a verb in the other dictionary. It might be useful at some point to review those.
VCP अड्ड 808 = SKD अड्ड 582 (VCP headword missed) VCP अल 4589 = SKD अल 2303 (VCP headword missed) VCP अर्क 4333 = SKD अर्क 2149 (VCP headword missed)
The issue here is that- it is present in VCP list in the Cologne page- (I mean it is found in digitized version) as well as the printed book, but it is not picked in the automated list. When I search on VCP cologne page, it is found instantly.
1. be sure the first 3 missed VCP roots are added to the list of VCP roots
Yes- But to be added to our automated No-Match list HTML generated file only for further use.
2. For the 4th, change the mapping function between VCP/SKD so amBa and aBa match.
Yes! Perfetct. :)
I mean- the ones to which the matching in VCP is not found - shall I note such SKD dhAtus? Or not needed?
Yes, I think you should make comments here when you feel confident that a verb in one dictionary is not mentioned as a verb in the other dictionary. It might be useful at some point to review those.
Ah, Sure- These things matter in the beginning of the work. Later re-work can be avoided. Thanks. Will make a comment in he same file with "; note" as you advised. Ok?
Case 0015: skd=ऐ and Case 0016: skd=क-- This is a non-root pick in the automation. Shall I note such issues for removing them later?
Yes, you should mention. As I need to modify the root selection program for SKD to exclude these. Suggest you use a slightly expanded format for these. Don't use the 'Case numbers', as these might change when the lists are regenerated. Rather, use the L=number: Example:
SKD ऐ 5583 (not a verb)
SKD क 5738 (not a verb)
Also,
VCP ऊष 10344 (missed verb)
Case 0015: skd=ऐ and Case 0016: skd=क-- This is a non-root pick in the automation. Shall I note such issues for removing them later?
Yes, you should mention. As I need to modify the root selection program for SKD to exclude these. Suggest you use a slightly expanded format for these. Don't use the 'Case numbers', as these might change when the lists are regenerated. Rather, use the L=number: Example:
Perfect!! Done!! Ok.
Current digitization:
old:
ऊष पु० रूजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत्
---
New: Short vowel 'u' in 'rujAyAm'
ऊष पु० रुजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत्
Reasons:
@Shalu411 Agree with this spelling change?
vcp changes re Uza: ref https://github.com/sanskrit-lexicon/csl-orig/issues/272
@Shalu411 Have revised vcp-skd-verb2-nomatch-deva Devanagari spelling (other reports mentioned above also revised. )
The revisions include:
The new reports do not yet handle
VCP उज्झ 8722 = SKD उद्झ 4713 SKD उज्झ 4431
This is different, because one VCP verb entry corresponds to two SKD entries. Not sure yet how to take this into account.
one VCP verb entry corresponds to two SKD entries. Not sure yet how to take this into account.
Time to introduce the references, as we spoke before? A root can be just a link in a different dictionary.
Another typo change in Uza
New: Short vowel 'u' in 'rujAyAm' ऊष पु० रुजायाम् भ्वा० पर० सक० सेट् । ऊषति औषीत् @Shalu411 Agree with this spelling change?
Sure!! It's correct. No long U..
@Shalu411 Have revised vcp-skd-verb2-nomatch-deva Devanagari spelling (other reports mentioned above also revised. )
Hariom. Jim, are you going to share this revised one? Am continuing with the old one however. Am notifying new issues in the text file.
Namaste In the no_match list- I started looking up the missing verbs in VCP as matched against SKD. i.e I take a dhatu in SKD (right side list) and check if it's there in VCP. More often than not- every dhatu is present in VCP in same form. Very rarely its some issue with the form difference as given in VCP that makes it no_match as per SKD. So here is the list so far- VCP अल 4589 = SKD अल 2303 VCP अर्क 4333 = SKD अर्क 2149 VCP उक्ष 8598 = SKD उक्ष 4358 VCP ऊष 10344 = SKD ऊष 5333 VCP ऋत 10441 = SKD ऋत 5385 VCP कज 11359 = SKD कज 5865 VCP कुच 13812 = SKD कुच 7865
I have been looking for things seriously differing, but it is ending up in a non-specific one everytime- almost everytime. I do not know why the automated list does not find the identical ones also. Jim, could you please let me know why this issue arises? Could you give me a new list after one more try probably? If its too tough, no need to waste time. But if its not that time/energy-consuming, please make it again for me. To summarize-- though the identical verb exists in VCP same as that in SKD, still the no_match list shows not. Will explain more if not followed.
Jim, could you please let me know why this issue arises?
I think that the underlying reason relates to the patterns used to identify verbs in VCP.
Will work on this problem in next day or two.
Will work on this problem in next day or two.
Thanks.
@Shalu411 Have revised vcp-skd-verb2-nomatch-deva Devanagari spelling (other reports mentioned above also revised. ) The 'match' report is also revised: vcp-skd-verb2-deva.
There are now 71 verbs in SKD that are unmatched.
Adding the 62 cases (no VCP headword) and the 9 cases (VCP headword, but not a verb), we get the 71 unmatched SKD cases shown in [vcp_skd_verb2_nomatch]https://github.com/sanskrit-lexicon/skd/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html).
Yes, it is possible. However, the VCP spelling would have to be different in some detail from the SKD spelling.
One example may be द्रै in SKD (currently not matched, among the 71).
There is द्रा which is known as a verb in both VCP and SKD.
I think that द्रै in SKD is the same as द्रा . @shalu11 will have to confirm.
If confirmed this would be a case where one verb in VCP matches TWO verbs in SKD,
similar to the VCP उज्झ 8722 = SKD उद्झ 4713 SKD उज्झ 4431
example above.
As mentioned above, the reports don't know how to handle such cases where one VCP
spelling corresponds to two SKD spellings; that's why उद्झ, L=4713
still shows up as
an unmatched SKD verb in the nomatch report.
reports don't know how to handle such cases where one VCP spelling corresponds to two SKD spellings
Should not we develop a solution for that by now? Otherwise it will come back again and again, same issue.
Hariom Namaste Great work Jim! I tried to see the revised work- but could not find the file
Adding the 62 cases (no VCP headword) and the 9 cases (VCP headword, but not a verb), we get the 71 unmatched SKD cases shown in [vcp_skd_verb2_nomatch]https://github.com/sanskrit-lexicon/skd/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html).
It says 404 Error- Page not found. Could you resend it, please?
The following 29 headwords of SKD were determined to be verbs: (L-number , SLP1 spelling):
'1827','1828', # ama
'1917', # amba
'2024', # aya
'2466', # avaDIra
'2726','2727', # aSa
'2986', # asu
'5410', # fSa
'5571', # elA
'6169', # kadqa
'6176', # kana
'10116', # Kala
'12656', # cihna
'12691', # cukka
'17624 ', # Danva
'20297', # pamba
'21784', # pucCa
'21843', # putta
'23309', # prA
'24428', # byuza
'27793', # mfga
'36475', # SmIla
'36553', # SrA
'37022', # zAntva
'37056', # zeva
'37097', # zwepa
'37099', # zwyE
'40829', # sPaqa
'40876', # sPurja
'42194', # hvf
There were previously 176 VCP headwords which had no matches as SKD verbs, using the general 'iti kavikalpadrumaH' rule for identifying SKD verbs.
A search was made for each of these 176 headword spellings in the SKD dictionary. For all but 31 of the 176, no SKD entry was found. For the 31 headwords for which an SKD entry WAS found, that SKD entry was further examined to determine if the entry pertains to a verb. The resulting list of such verbs is shown above.
There were 2 headwords (KelA and goDA) which were found in SKD, but which were judged to be non-verbs.
The various reports were regenerated using the additions to SKD verbs. Now there are 148 VCP verb headwords still unmatched to SKD verbs, down from 176.
general 'iti kavikalpadrumaH' rule for identifying SKD verbs.
Now I know how it was made.
There were 2 headwords (KelA and goDA) which were found in SKD, but which were judged to be non-verbs.
And still had iti kavikalpadrumaH
?
Now there are 148 VCP verb headwords still unmatched to SKD verbs, down from 176.
And the list can be seen at?
Hariom, Namaste
using the general 'iti kavikalpadrumaH' rule for identifying SKD verbs.
That's wonderful!
The various reports were regenerated using the additions to SKD verbs.
Would love to see them and start working. Can't wait more. Please complete whatever is needed, and share Jim.. So that we all see our dream realized. :)
404 error
This is the corrected link: https://github.com/sanskrit-lexicon/SKD/blob/master/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html
This is like looking at an html file as text. I
First, click on the 'Raw' button.
You will see:
In the above, RIGHT-CLICK save-as in the above window. You will then see a dialog:
Click the 'Save' button. This will download the file.
On your local computer, navigate to the downloaded file, then double click.
This will open the downloaded file in your browser
Now you should see:
In the very third comment of this issue, I gave the link: vcp-skd-verb2-nomatch-deva , which gives
directly an html view of the file:
Note that the file is dated 'March 4, 2020', while the latest display (that you downloaded above) is dated 'May 19, 2020'.
The reason is that I have not been copying the latest versions of the reports to the 'https://sanskrit-lexicon.github.io/' repository.
I plan to give links to the latest versions of the reports in this https://github.com/sanskrit-lexicon/SKD/ repository, so you will need to go through the downloading process to view the files as html.
And KelA and goDA still had iti kavikalpadrumaH?
Not quite. Let me try to say it differently. Please have open the skd verb list skd_verb_filter.txt.
For skd, the MAIN rule for verbs is iti kavikalpadrumaH
. In the verb list, search for 'code=1,'. When done in the browser you get 2,149 matches (this includes a small number with code=10, code=11, code=13, code=14, code=15). And there are 2,330 entries in total (go to bottom of file).
There are also a handful with code=2, code=3. All these records have a line in skd.txt that matches iti kavikalpadrumaH either exactly or with one of several minor variant.
Not every entry with 'iti kavi..' is a verb. For example, entries that have a have non-verb marker are excluded by the program that makes the verb list (specifically, klI,
or puM,
or strI,
or tri,
are excluded).
Now look at the 'code=4' lines (there are 33 of these). Entries in the verb list with this code have been added 'manually'. Some of these have no iti kavi..
at all, such as the two ama
entries.
ama [p= 1-081]
ama k roge . ka āmayati vyādhirlokaṃ . itidurgādāsaḥ .. [ID=1827]
ama gatau . bhajane . śabde . amati . iti durgā-dāsaḥ .. [ID=1828]
Note that ama
is in the 'additional SKD verbs found list in the comment above.
But there are some of those additional SKD verbs that actually do have some form of
iti kavi..
, but not a form recognized by the program. For example:
;; Case 1986: L=36475, k1=SmIla, k2=SmIla, code=4
504231: SmIla¦, nimizaRe . itikavikalpadrumaH .. (BvA0 para0
Note that there is not a space in itikav...
, and the matching variants didn't include this possibility.
Now as to the question regarding godA and KelA. Here we must also look at the VCP verb list. Note that godA and KelA are both said to be VCP verbs:
;; Case 0519: L=18064, k1=goDA, k2=goDA, code=1
243907: goDA¦ kOwilye kaRqvAderAkftigaRaH nAmaDAtuH para0 aka0
;; Case 0318: L=14982, k1=kelA, k2=kelA, code=1
201909: kelA¦ vilAse kaRqvA0 A0 a0 sew . kelAyate akelA-
Being VCP verbs, goDA and KelA were considered as additional candidates for the SKD verb list: See https://www.sanskrit-lexicon.uni-koeln.de/simple/skd/goDA
and https://www.sanskrit-lexicon.uni-koeln.de/simple/skd/KelA
You'll see that both are 'strI' (feminine nominals) -- so that's why they are not in the SKD verb list.
Hope this helps clear up some details about how the skd verb list was chosen.
P.S. -- @gasyoun -- play around with the 'simple' link form above. Isn't this close to what you were after as a shareable link?
The reason is that I have not been copying the latest versions of the reports to the 'https://sanskrit-lexicon.github.io/' repository.
That would make more sense, no? But the downloading process can go with Usha as well, guess.
entries that have a have non-verb marker are excluded by the program
What's the name of the Python script or part of the code? Want to know how things work in case we need to redupicate the process, but you say you are fed up (hope not soon) :)
Note that there is not a space in itikav... , and the matching variants didn't include this possibility.
Typo error to be fixed?
P.S. -- @gasyoun -- play around with the 'simple' link form above. Isn't this close to what you were after as a shareable link?
I'm speechless. Thanks!!!
Have said in the past, let me repeat:
1) https://www.sanskrit-lexicon.uni-koeln.de/simple/skd/KelA that is good. But we can still improve it.
2) https://www.sanskrit-lexicon.uni-koeln.de/simple/SKD/KelA is better, because there is no abbreviation skd
, but there is SKD
on the homepage and so should it be CAPITAL in:
/SKD/KelA
instead of /skd/KelA
skd
as an option, but when we search in the input window, we choose from the capital letter abbreviations.3) As we intend to use rewrite magic, there is no need to have /simple/
and www
in the URL and https://sanskrit-lexicon.uni-koeln.de/SKD/KelA is just enough and as short as can be.
Hope I make myself clear enough. If done so - not questions asked and what has been started 6 years ago will be put to an end, thanks!
This is the corrected link: https://github.com/sanskrit-lexicon/SKD/blob/master/verbs01/vcp_skd/vcp_skd_verb2_nomatch_deva.html
Namaste I have received the file for work now. Thanks! Thanks for all the instructions. :) Feels great.
A report has been developed to facilitate comparison for verbs between VCP and SKD dictionaries. This is in response to an interest expressed by @Shalu411 .
The reports are in the form of HTML documents, located here. To see the reports as rendered html, click one of the links in the readme. One report is in slp1 spelling, and one is in Devanagari spelling.