Open drdhaval2785 opened 1 year ago
Note that I have used the syns
details in Devanagari to help easier filling of the data by non-technical non-SLP friendly people.
They can work in native Devanagari script.
पुं
- musculine
स्त्री
- feminine
क्ली
- neuter
अ
- indeclinable
This gender information can be expanded if needed.
Some discussions of #405 for small changes in the proposed formats should be referred.
What are ';c' and ';k' ?
;c for comments ;k for kAnda i.e. chapter name
;p for page and ;l for line
Dear @funderburkjim
I have prepared a full fledged dictionary (abhidhānacintāmaṇi of Hemacandra) as an experiment for this samānārthaka kośa exercise.
Kindly try to add it to kosha-dev for trial, and thereafter to CDSL. Metadata is in the file itself.
Sample data
<L>38<pc>7
<eid>53<syns>स्वर्ग-पुं,त्रिविष्टप-पुं,द्यो-स्त्री,दिव्-स्त्री,भुवि-स्त्री,तविष-पुं,ताविष-पुं,नाक-पुं,गो-स्त्री,त्रिदिव-क्ली,ऊर्ध्वलोक-पुं,सुरालय-पुं
<eid>54<syns>अमर-पुं,देव-पुं,सुपर्वन्-पुं,सुर-पुं,निर्जर-पुं,देवता-पुं,ऋभु-पुं,बर्हिर्मुख-पुं,अनिमिष-पुं,दैवत-पुं,नाकिन्-पुं,लेख-पुं,वृन्दारक-पुं,सुमनस्-पुं,त्रिदश-पुं,अमर्त्य-पुं,स्वाहाभुज्-पुं,स्वधाभुज्-पुं,क्रतुभुज्-पुं,आदितेय-पुं,गीर्वाण-पुं,मरुत्-पुं,अस्वप्न-पुं,विबुध-पुं,दानवारि-पुं
स्वर्गस्त्रिविष्टपं द्योदिवौ भुविस्तविषताविषौ नाकः ।
गौस्त्रिदिवमूर्ध्वलोकः सुरालयस्तत्सदस्त्वमराः ॥ ८७ ॥
देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः ।
वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥
गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः ।
<LEND>
<L>39<pc>7
<eid>55<syns>विमान-पुं,देवयान-क्ली
<eid>56<syns>अन्धस्,पीयूष-क्ली,अमृत-क्ली,सुधा-स्त्री
तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥
<LEND>
The tag eid
stands for extra id - See #409 for details.
It is a unique identified meant to identify the synset (for future cross-dictionary referencing, commentary referencing etc).
We can put this eid in tail
in the XML file.
CDSL may not use it in the frontend, if found superfluous as of now.
Headwords
13456
Headwords with gender information
13456
[('पुं', 6960), ('क्ली', 3068), ('स्त्री', 2492), ('पुंक्ली', 378), ('अ', 176), ('पुंस्त्री', 117), ('वा', 94), ('पुंद्वि', 39), ('त्रि', 38), ('स्त्रीक्ली', 31), ('स्त्रीब', 23), ('पुंब', 15), ('स्त्रीद्वि', 12), ('स', 4), ('पुंस्त्रीब', 2), ('क्लीद्वि', 2), ('क्लीब', 2), ('पुंक्लीब', 1), ('पुंक्लीद्वि', 1), ('वापुंक्ली', 1)]
Types of Gender-person information and their frequency पुं - masculine क्ली - neuter स्त्री - feminine त्रि - All three genders अ - Indeclinable स - सर्वनामन् - pronouns वा - वाच्यलिङ्ग - Gender as per the noun following this adjective द्वि - dual ब - plural
Kindly use ABCH dictionary code for this dictionary.
@drdhaval2785 acknowledging your request re kosha-dev version of abch.
Concept of samAnArthaka dictionaries
There is no specified headword. Synonyms are clubbed together (with our without gender information).
Explanation in mathematical terms
If samAnArthaka relationship is denoted by f(n), f(n) {A, B, C} would mean A = B A = C B = C
Sample data
From Amarakosha nAnArthavarga
देवाः सुपर्वसुरनिर्जरदेवतर्भुबर्हिर्मुखानिमिषदैवतनाकिलेखाः । वृन्दारकाः सुमनसस्त्रिदशा अमर्त्याः स्वाहास्वधाक्रतुसुधाभुज आदितेयाः ॥ ८८ ॥ गीर्वाणा मरुतोऽस्वप्ना विबुधा दानवारयः । तेषां यानं विमानोऽन्धः पीयूषममृतं सुधा ॥ ८९ ॥
Problem to be handled
We need to devise a markup standard by which the information is captured without any loss, while encoding. We can use this information later on, for display or otherwise. We can later on generate synsets too.
Proposed markup (Edited per https://github.com/sanskrit-lexicon/COLOGNE/issues/405#issuecomment-1471218634)
In case, the gender information is absent / ambiguous, do not try too hard to interpret manually. We can leave the information blank. Better not to encode information explicitly when we are not sure about the same. In the following verse, I am not sure what would be the gender of पवि and भिदु. So, kept them blank. (For later uses, this information can be pulled from other dictionaries if required).
Explanation of metaline
L
is the lnum which would be unique for eachheadword:meanings
pair.pc
is page-column number detail to identify the page number.syns
is comma separated list ofheadword:gender
information of all members of the synset.