sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Making a case for unique identifier for each synset / word-meaning set #409

Open drdhaval2785 opened 1 year ago

drdhaval2785 commented 1 year ago

Dictionaries sometimes encode relationships between two synsets. We need to capture such information. It is possible only if synset is uniquely identifiable. We can internally link them and depict the relationships between them.

e.g. शार्ङ्ग is विष्णु's चाप. I have written it currently as विष्णुचाप, which is not so elegant way of doing it.

Data

<syns>विष्णु-पुं,नारायण-पुं,बभ्रु-पुं,चक्रपाणि-पुं,जनार्दन-पुं,दैत्यारि-पुं,पुण्डरीकाक्ष-पुं,त्रिककुद्-पुं,विष्टरश्रवस्-पुं,पीताम्बर-पुं,हृषीकेश-पुं,विष्वक्सेन-पुं,चतुर्भुज-पुं,श्रीवत्स-पुं,श्रीपति-पुं,शार्ङ्गिन्-पुं,श्रीवत्साङ्क-पुं,अच्युत-पुं,हुण-पुं,वासुदेव-पुं,स्वभू-पुं,चक्रिन्-पुं,वैकुण्ठ-पुं,पुरुषोत्तम-पुं,अरिष्टनेमि-पुं,अजित-पुं,श्रीधर-पुं,यज्ञपूरुष-पुं,मुञ्जकेशिन्-पुं,मुररिपु-पुं,गदापाणि-पुं,अधोऽक्षज-पुं,अनन्तशायिन्-पुं,वृन्दाक-पुं,मुकुन्द-पुं,धरणीधर-पुं,शतानन्द-पुं,शतावर्त-पुं,युगावर्त-पुं,सुरोत्तम-पुं,कालकुन्थ-पुं,रन्तिदेव-पुं,केशव-पुं,गरुडध्वज-पुं,पद्मनाभ-पुं,विश्वरूप-पुं,कृष्ण-पुं,हरि-पुं,असंपुष-पुं,कैटभारि-पुं,ब्रह्मनाभ-पुं,गोविन्द-पुं,मधुसूदन-पुं
विष्णुर्नारायणो बभ्रुश्चक्रपाणिर्जनार्दनः ।
;l{0020}
दैत्यारिः पुण्डरीकाक्षस्त्रिककुद्विष्टरश्रवाः ॥ १० ॥
पीतांबरो हृषीकेशो विष्वक्सेनश्चतुर्भुजः ।
;p{0004}
श्रीवत्सश्श्रीपतिश्शार्ङ्गी श्रीवत्सांकोऽच्युतो हुणः ॥ ११ ॥
वासुदेवस्स्वभूश्चक्री वैकुण्ठः पुरुषोत्तमः ।
अरिष्टनेमिरजितश्श्रीधरो यज्ञपूरुषः ॥ १२ ॥
;l{0025}
मुञ्जकेशी मुररिपुर्गदापाणिरधोऽक्षजः ।
अनन्तशायी वृन्दाको मुकुन्दो धरणीधरः ॥ १३ ॥
शतानन्दश्शतावर्तो युगावर्तस्सुरोत्तमः ।
कालकुन्थो रन्तिदेवः केशवो गरुडध्वजः ॥ १४ ॥
पद्मनाभो विश्वरूपः कृष्णो हरिरसंपुषः ।
;l{0030}
कैटभारिर्ब्रह्मनाभो गोविन्दो मधुसूदनः ॥ १५ ॥
<LEND>
.
.
.
<L>5<pc>4
<syns>कौस्तुभ-पुं,विष्णुमणि
<syns>श्रीवत्स-पुं,विष्णुलक्ष्मन्
<syns>नन्दक-पुं,विष्ण्वसि
<syns>शार्ङ्ग-क्ली,विष्णुचाप-पुं
<syns>पाञ्चजन्य-पुं,विष्णुशङ्ख-पुं
<syns>सुदर्शन-क्ली,विष्णुचक्र-क्ली
कौस्तुभोऽस्य मणिर्लक्ष्म श्रीवत्सो नन्दकस्त्वसिः ।
चापश्शार्ङ्गं पाञ्चजन्यश्शंखश्चक्रं सुदर्शनम् ॥ १७ ॥
.
.
.
<syns>चाप,धनुष्,आस,इष्वास,धनुर्,द्रुण,कार्मुक,धन्व,कोदण्ड,आयुधाग्र्य,शरासन
अस्त्रियौ चापधनुषावासेष्वासौ धनुर्द्रुणम् ।
कार्मुकं धन्व कोदण्डमायुधाग्र्यं शरासनम् ॥ १७२ ॥
drdhaval2785 commented 1 year ago

If I have unique synset numbers like the following, I can encode explicitly the relationship between them. If I denote genitive / possessive relationship by '#', the relationship in शार्ङ्ग can be explicitly coded as 1#140 which would stand for all of the following headwords. विष्णुचाप,नारायणचाप,बभ्रुचाप,,,,,,,विष्णुधनुष्,,,,,,विष्णुकार्मुक,नारायणकार्मुक,,,,,,,,,विष्णुशरासन,,,,,मधुसूदनशरासन.

<eid>1<syns>विष्णु-पुं,नारायण-पुं,बभ्रु-पुं,चक्रपाणि-पुं,जनार्दन-पुं,दैत्यारि-पुं,पुण्डरीकाक्ष-पुं,त्रिककुद्-पुं,विष्टरश्रवस्-पुं,पीताम्बर-पुं,हृषीकेश-पुं,विष्वक्सेन-पुं,चतुर्भुज-पुं,श्रीवत्स-पुं,श्रीपति-पुं,शार्ङ्गिन्-पुं,श्रीवत्साङ्क-पुं,अच्युत-पुं,हुण-पुं,वासुदेव-पुं,स्वभू-पुं,चक्रिन्-पुं,वैकुण्ठ-पुं,पुरुषोत्तम-पुं,अरिष्टनेमि-पुं,अजित-पुं,श्रीधर-पुं,यज्ञपूरुष-पुं,मुञ्जकेशिन्-पुं,मुररिपु-पुं,गदापाणि-पुं,अधोऽक्षज-पुं,अनन्तशायिन्-पुं,वृन्दाक-पुं,मुकुन्द-पुं,धरणीधर-पुं,शतानन्द-पुं,शतावर्त-पुं,युगावर्त-पुं,सुरोत्तम-पुं,कालकुन्थ-पुं,रन्तिदेव-पुं,केशव-पुं,गरुडध्वज-पुं,पद्मनाभ-पुं,विश्वरूप-पुं,कृष्ण-पुं,हरि-पुं,असंपुष-पुं,कैटभारि-पुं,ब्रह्मनाभ-पुं,गोविन्द-पुं,मधुसूदन-पुं
<eid>17<syns>शार्ङ्ग-क्ली,1#140
<eid>140<syns>चाप,धनुष्,आस,इष्वास,धनुर्,द्रुण,कार्मुक,धन्व,कोदण्ड,आयुधाग्र्य,शरासन
drdhaval2785 commented 1 year ago

This is possible only when there are unique synsets (not Lnums) assigned to each synset / word-meaning set in samAnArthaka kosha and anekArthaka kosha respectively.

The question is what would be the ideal place to give this information? In my opinion, xxx.txt file would be the ideal place. Hardcoding it would serve the same purpose which Lnums are serving today. In future, if there is some error in numbering found out, or new synset gets added based on some edition of that book, having fixed eid ensures that the relationships which had been encoded earlier does not get altered. What does @funderburkjim think?

drdhaval2785 commented 1 year ago

eid is arbitrary name - shorthand for extra id.

gasyoun commented 1 year ago

860

Here what would be the extra id.?

funderburkjim commented 1 year ago

AFAIK, Wordnet (https://wordnet.princeton.edu/) is the primary example of synsets (for English). It is possible to use NLTK (Natural language toolkit) with python to explore wordnet.

In wordnet, synsets are identified by a specific word (e.g. the synset for 'dog').

Bing chat tells

Synsets are linked with each other to form various kinds of relations. These relations can be semantic or lexical. Semantic relations include hypernymy (a more general concept), hyponymy (a more specific concept), meronymy (a part-whole relationship), and holonymy (a whole-part relationship). Lexical relations include antonymy (opposite meaning), entailment (one concept implies another), and derivation (one word is derived from another).

Perhaps we should model our thinking about synsets after the wordnet approach. i.e., learn how wordnet works, and make a Sanskrit-wordnet similarly. From a first glance, it appears that the underlying data structures for wordnet are likely to be described, either at the wordnet site or at nltk.

Bing chat responds to question how to make wordnet for another language using nltk?

Yes, it is possible to create your own version of WordNet for another language using NLTK. You can use the NLTK’s wordnet reader object and initialize a wrapper object that provides its own defaults (https://stackoverflow.com/questions/39569307/how-to-change-nltk-default-wordnet-language-to-zsm) 1 (https://stackoverflow.com/questions/39569307/how-to-change-nltk-default-wordnet-language-to-zsm). However, you need to have a WordNet-like resource for the language you want to use (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually) 2 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually). You can also use Open Multilingual WordNet (OMW) which links WordNets of different languages to the Princeton WordNet version 3.0 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually) 2 (https://stackoverflow.com/questions/31478152/how-to-use-the-language-option-in-synsets-nltk-if-you-load-a-wordnet-manually).

drdhaval2785 commented 1 year ago

Sanskrit Wordnet - https://www.cfilt.iitb.ac.in/wordnet/webswn/wn.php