sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Discussion about metaline for samAnArthaka dictionaries (समानार्थक कोश) #406

Open drdhaval2785 opened 1 year ago

drdhaval2785 commented 1 year ago

Concept of samAnArthaka dictionaries

There is no specified headword. Synonyms are clubbed together (with our without gender information).

Explanation in mathematical terms

If samAnArthaka relationship is denoted by f(n), f(n) {A, B, C} would mean A = B A = C B = C

Sample data

From Amarakosha nAnArthavarga

देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः । वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥ गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः । तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥

Problem to be handled

We need to devise a markup standard by which the information is captured without any loss, while encoding. We can use this information later on, for display or otherwise. We can later on generate synsets too.

Proposed markup (Edited per https://github.com/sanskrit-lexicon/COLOGNE/issues/405#issuecomment-1471218634)

<L>1<pc>23
<syns>देव:पुं,सुपर्वन्:पुं,सुर:पुं,निर्जर:पुं,देवता:पुं,ऋभु:पुं,बर्हिर्मुख:पुं,अनिमिष:पुं,दैवत:पुं,नाकिन्:पुं,लेख:पुं,वृन्दारक:पुं,सुमनस्:पुं,त्रिदश:पुं,अमर्त्य:पुं,स्वाहभुज्:पुं,स्वधाभुज्:पुं,क्रतुभुज्:पुं,सुधाभुज्:पुं,आदितेय:पुं,गीर्वाण:पुं,मरुत्:पुं,अस्वप्न:पुं,विबुध:पुं,दानवारि:पुं
<syns>यान:क्ली,विमान:पुं
<syns>अन्धस्:क्ली,पीयुष:क्ली,अमृत:क्ली,सुधा:स्त्री
देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः ।
वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥
गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः ।
तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥
<LEND>

In case, the gender information is absent / ambiguous, do not try too hard to interpret manually. We can leave the information blank. Better not to encode information explicitly when we are not sure about the same. In the following verse, I am not sure what would be the gender of पवि and भिदु. So, kept them blank. (For later uses, this information can be pulled from other dictionaries if required).

<syns>वज्र-पुंक्ली,कुलिश-पुंक्ली,भिदुर-क्ली,शतधारक-क्ली,व्याधाम-पुं,दम्भोलि-पुं,शतकोटि-पुं,पवि,भिदु
अस्त्रियौ वज्रकुलिशौ भिदुरं शतधारकम् ।
व्याधामः पुंसि दंभोलिश्शतकोटिः पविर्भिदुः ॥ १३ ॥

Explanation of metaline

L is the lnum which would be unique for each headword:meanings pair. pc is page-column number detail to identify the page number. syns is comma separated list of headword:gender information of all members of the synset.

drdhaval2785 commented 1 year ago

Note that I have used the syns details in Devanagari to help easier filling of the data by non-technical non-SLP friendly people. They can work in native Devanagari script. पुं - musculine स्त्री - feminine क्ली - neuter - indeclinable This gender information can be expanded if needed.

drdhaval2785 commented 1 year ago

Some discussions of #405 for small changes in the proposed formats should be referred.

funderburkjim commented 1 year ago

What are ';c' and ';k' ?

drdhaval2785 commented 1 year ago

;c for comments ;k for kAnda i.e. chapter name

drdhaval2785 commented 1 year ago

;p for page and ;l for line

drdhaval2785 commented 1 year ago

Dear @funderburkjim

I have prepared a full fledged dictionary (abhidhānacintāmaṇi of Hemacandra) as an experiment for this samānārthaka kośa exercise.

abch1.txt

Kindly try to add it to kosha-dev for trial, and thereafter to CDSL. Metadata is in the file itself.

drdhaval2785 commented 1 year ago

Sample data

<L>38<pc>7
<eid>53<syns>स्वर्ग-पुं,त्रिविष्टप-पुं,द्यो-स्त्री,दिव्-स्त्री,भुवि-स्त्री,तविष-पुं,ताविष-पुं,नाक-पुं,गो-स्त्री,त्रिदिव-क्ली,ऊर्ध्वलोक-पुं,सुरालय-पुं
<eid>54<syns>अमर-पुं,देव-पुं,सुपर्वन्-पुं,सुर-पुं,निर्जर-पुं,देवता-पुं,ऋभु-पुं,बर्हिर्मुख-पुं,अनिमिष-पुं,दैवत-पुं,नाकिन्-पुं,लेख-पुं,वृन्दारक-पुं,सुमनस्-पुं,त्रिदश-पुं,अमर्त्य-पुं,स्वाहाभुज्-पुं,स्वधाभुज्-पुं,क्रतुभुज्-पुं,आदितेय-पुं,गीर्वाण-पुं,मरुत्-पुं,अस्वप्न-पुं,विबुध-पुं,दानवारि-पुं
स्वर्गस्त्रिविष्टपं द्योदिवौ भुविस्तविषताविषौ नाकः ।
गौस्त्रिदिवमूर्ध्वलोकः सुरालयस्तत्सदस्त्वमराः ॥ ८७ ॥
देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः ।
वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥
गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः ।
<LEND>
<L>39<pc>7
<eid>55<syns>विमान-पुं,देवयान-क्ली
<eid>56<syns>अन्धस्,पीयूष-क्ली,अमृत-क्ली,सुधा-स्त्री
तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥
<LEND>
drdhaval2785 commented 1 year ago

The tag eid stands for extra id - See #409 for details. It is a unique identified meant to identify the synset (for future cross-dictionary referencing, commentary referencing etc). We can put this eid in tail in the XML file. CDSL may not use it in the frontend, if found superfluous as of now.

drdhaval2785 commented 1 year ago
Headwords
13456
Headwords with gender information
13456
[('पुं', 6960), ('क्ली', 3068), ('स्त्री', 2492), ('पुंक्ली', 378), ('अ', 176), ('पुंस्त्री', 117), ('वा', 94), ('पुंद्वि', 39), ('त्रि', 38), ('स्त्रीक्ली', 31), ('स्त्रीब', 23), ('पुंब', 15), ('स्त्रीद्वि', 12), ('स', 4), ('पुंस्त्रीब', 2), ('क्लीद्वि', 2), ('क्लीब', 2), ('पुंक्लीब', 1), ('पुंक्लीद्वि', 1), ('वापुंक्ली', 1)]

Types of Gender-person information and their frequency पुं - masculine क्ली - neuter स्त्री - feminine त्रि - All three genders अ - Indeclinable स - सर्वनामन् - pronouns वा - वाच्यलिङ्ग - Gender as per the noun following this adjective द्वि - dual ब - plural

drdhaval2785 commented 1 year ago

Kindly use ABCH dictionary code for this dictionary.

funderburkjim commented 1 year ago

@drdhaval2785 acknowledging your request re kosha-dev version of abch.