sanskrit-lexicon / hwnorm2

0 stars 0 forks source link

pw prefixed verbs #3

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

Changes were made so that prefixed verbs in pw dictionary may be accessed via the keydoc database. This note describes the changes in some detail, so that similar work may be done for other dictionaries.

funderburkjim commented 4 years ago

Two entry points were added to the keydoc system of this hwnorm2 repository. There are two dictionary-specific files in the keydoc/distinctfiles directory.

Currently there are two pw files in distinctfiles:

There is also an mw file in distinctfiles: mw_norm_extra.txt.
The current records in this file also relate alternate prefixed verb spellings for prefixed verbs appearing in mw to the virtual prefixed verb spellings for pw. There are about 1500 such records. For example, vikf vikar relates the mw headword 'vikf' to the virtual (pw) headword 'vikar'.

The next comment shows some examples of these features using the current research display.

funderburkjim commented 4 years ago

Examples

These examples are from the [dalglob1]() research display; this display also works in local installations.

Example 1: search for vikar

We see that vikar appears in 6 dictionaries; In 5 of them, it is spelled 'vikf'. In pw, the search leads to the entry for 'kar'. image

Note 1: Why is pwg dictionary absent for 'vikar'? The answer is that we haven't yet done the work for prefixed verbs for pwg to get pwg_norm.txt and pwg_keydoc_input.txt. There is similar markup in pwg (<div n="p">— {#vi#}) which should permit a prefixed verb enhancement for pwg similar to the one we are currently describing for pw.

Note 2: If we click the 'pw' button, the bottom panel of the display will show the entry for 'kar'. If we click the 'ap90' button, the bottom panel will show the entry for 'vikf'.

funderburkjim commented 4 years ago

Example 2: search for vikf

This gives exactly the same coverage as the previous search for 'vikar', as it should!\ image

funderburkjim commented 4 years ago

Search for 'kf'

image

image

funderburkjim commented 4 years ago

Open question

If you look closely at 'kar' in pw (or ccs), you will see that one entry has the verb form 'kirati'. In 'mw' this is associated with the spelling 'kF' (not 'kf').

One thing we could do is to consider 'kF' as another alternate spelling for 'kar' (in pw_norm_extra.txt). This would have the impact that, in mw, 'kf kF' would be in the same document.

I'm currently mildly in favor of this, and similar other, linkages. Does anyone have an opinion on this?

gasyoun commented 4 years ago

9000 such examples

We have 10500 MW sopasarga roots, can we use that data?

verb spellings for prefixed verbs appearing in mw to the virtual prefixed verb spellings for pw. There are about 1500

Seems like you've got the right one.

'kF' as another alternate spelling for 'kar'

Could not it get us in trouble? Like more false variants than goodness, that it can bring.

I'm currently mildly in favor of this, and similar other, linkages.

Let's link. Than we will see what wrong links are born out of it. I'm all for it. I was waiting 7 years for this post.

funderburkjim commented 4 years ago

We have 10500 MW sopasarga roots, can we use that data?

Yes, in fact that data WAS used.

I've copied the working directory used in the PW preverb study into this repository as 01-pwverbs.

The readme file therein describes the steps.

A couple of files that might be of interest are:

gasyoun commented 4 years ago

The readme file therein describes the steps.

So detailed, as usual. My ideas.

Good for dhatu:

DHĀTUP  
onomatop.
Sautra  
Wurzel  
caus.   
v.  l.  
perf.   
med.    
mit     
von 

Can't be dhatu:

adj.    
m.  
n.  
f.  
interj. 
indecl. 
pron.   
nom.

pws

gasyoun commented 3 years ago

@funderburkjim did some of my hooks helped?

funderburkjim commented 3 years ago

There probably is some overlap in your 'hooks' and in the hooks I used for filtering verbs.

Based on the part of your table above, there are some verbs that my filter missed, such as line 4 aqq (slp1 spelling).

Your table also has some pretty definite NON-VERBS (example line 22 andha)

Question: What is your column A ? The numbers there don't correspond to the Cologne IDS for PW.

Suggestion: Provide a text file of headwords column B (devanagari) OR column E (HK transliteration). If you know a way to filter out definite non-verbs, please do exclude them.

I can develop diffs of your headword list with my PW verb headword list (such as in preverb1.txt).

This will provide a source of verbs that you found, but that I missed; and vice-versa.