sanskrit-lexicon / hwnorm2

0 stars 0 forks source link

'keydoc' (dev) #1

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

This repository is an offshoot of the hwnorm1 repository.

The idea is to define a dictionary document by a collection of intrinsic dictionary headwords, then to allow access to such documents by the intrinsic headword spellings as well as alternate and normalized spellings. The term 'keydoc' (a document defined by headword keys) is one way to refer to this notion; and it is currently represented by a database with the beautiful name keydoc_glob1 (global keydoc database).

dalglob1 is a display that uses the new database. The database does not currently affect other displays.

There are two Youtube videos:

gasyoun commented 4 years ago

Jim, thanks for documenting in detail the issue with 443k spellings. GenerallyI do not understand where this UI will fit, as now we have so many different path to go. But the issues at the end of 2nd video, like pitA and pitf - have not we solved them already in the past in a different place?

nṛsiṃhaācārya does look in your video anti-sandhi. (nṛsiṃhaācārya is an alternate of narasiṃha.) narasiṃha or nṛsiṃha ācārya

In 1st video you give alternate headword for MW based on that ACC gives narasiṃha and nṛsiṃha ācārya as synonyms. In 2nd video MW gives guru and gurvi, but you do not use this connection for other dictionaries, or they just do not have a gurvi entry or subentry that can be used?

funderburkjim commented 4 years ago

where this UI will fit

This UI is currently just for research purposes. The research questions:

local document search terms

Assume that a document D in dictionary X is determined by headwords with spellings H1,H2,.. in X. Then the local search terms L1,L2,... for D currently include:

global document search terms

The global search terms G1,G2,... for a document D in dictionary X take into account other dictionaries.

funderburkjim commented 4 years ago

nṛsiṃhaācārya does look in your video anti-sandhi.

I agree. This looks like a bug in acc. In fact all the following instances look to be similar errors:

13 matches for "aa" in buffer: acc_hwextra.txt
     68:<L>1783.1<k1>kOSikAditya<k2>kOSikAditya<type>alt<LP>1783<k1P>AdityaAcArya
    104:<L>2568.1<k1>udayakaraAcArya<k2>udayakara AcArya<type>alt<LP>2568<k1P>udayana
    179:<L>4657.1<k1>kfzRamBawwa<k2>kfzRamBawwa,<type>alt<LP>4657<k1P>kfzRaBawwaArqe
    210:<L>5684.1<k1>gaReSvaraAcArya<k2>gaReSvara AcArya<type>alt<LP>5684<k1P>gaReSadEvajYa
    401:<L>11100.1<k1>nfsiMhaAcArya<k2>nfsiMha AcArya<type>alt<LP>11100<k1P>narasiMha
    496:<L>13957.1<k1>SuBaMkara<k2>SuBaMkara<type>alt<LP>13957<k1P>pragalBaAcArya
    778:<L>22017.1<k1>dIkzita<k2>dIkzita<type>alt<LP>22017<k1P>vAsudevaaDvarin
    814:<L>23353.1<k1>veNkawanATa<k2>veNkawanATa<type>alt<LP>23353<k1P>veNkawaAcArya
    815:<L>23359.1<k1>veNkaweSa<k2>veNkaweSa<type>alt<LP>23359<k1P>veNkawaAcArya
    903:<L>26044.1<k1>SrInivAsatIrTa<k2>SrInivAsatIrTa<type>alt<LP>26044<k1P>SrInivAsaAcArya
    951:<L>28306.1<k1>darSanAcArya<k2>darSanAcArya<type>alt<LP>28306<k1P>sudarSanaAcArya
    952:<L>28306.2<k1>darSanArya<k2>darSanArya<type>alt<LP>28306<k1P>sudarSanaAcArya
    956:<L>28551.1<k1>viSvarUpa<k2>viSvarUpa<type>alt<LP>28551<k1P>sureSvaraAcArya

I think all the 'aA' in 'k1' or 'k1P' should be changed to 'A'.

@drdhaval2785 agree?

funderburkjim commented 4 years ago

local document extension

After the global document search term step mentioned above, there is one more step (keydoc2.txt) which revises the local document definitions.

An abstract statement of this process might be: For a given dictionary X, merge all documents which have a common search term.

The example of Burnouf with guru and gurvI might help. Before the global search term step, the relevant items (in keydoc_norm.txt) for Burnouf shows two documents:

  1. guru
  2. gurvI

These documents are, at this stage, unrelated.

After the global merge step, the relevant items (in keydoc_,merge.txt for burnouf) still shows two documents, but with additional search terms.

  1. guru gurvI,guruH
    • gurvI is a search term for guru in BUR because it is a search term for guru in MW
    • guruH is a search term for guru, because in SKD guru is a normalized spelling search term for guruH
  2. gurvI guru,gurvvI
    • guru is a search term for gurvI in BUR because it is a search term for gurvI in MW
    • gurvvI is a search term for gurvI in BUR because gurvI is a search term for gurvvI in SHS, SKD, VCP, WIL and YAT.

The last step merges these two documents, so now there is only 1 combined document in burnouf (keydoc2.txt):

  1. guru,gurvI guruH,gurvvI

The reason these are merged is because there are common spellings in the two merged documents: In fact, in this case, 'guru' and 'gurvI' are both common search terms in the merged documents.

So that is how the new, two-headword, document occurs in Burnouf.

gasyoun commented 4 years ago

The last step merges these two documents, so now there is only 1 combined document in burnouf (keydoc2.txt):

Now let's think how it can and should live together with simple. And let's at least document what kind of relations are given in each dictionary between words. There are antonyms in GRA, for example and we have never even tried to markup them. Or another approach. giri is based on guru, that is based on root gir as per Kossowich, but Wilson gives E. gṝ.

drdhaval2785 commented 3 years ago

@drdhaval2785 agree? I agree

gasyoun commented 3 years ago

https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/sample/dalglob1.php is left. But there was a more modern version of it anyway, no, @funderburkjim ?